IDR / bioformats

Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment (particularly UW-Madison LOCI and Glencoe Software). Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software.
http://www.openmicroscopy.org/site/products/bio-formats
GNU General Public License v2.0
0 stars 0 forks source link

ScreenReader: close sub-readers to prevent file resources leaking #23

Closed sbesson closed 3 years ago

sbesson commented 3 years ago

Recent work on the Zarr HCS specification as well as recent submissions have demonstrated that we have a file leak in ScreenReader as resources are not been closed on access leading eventually to Too many open files exceptions.

Using the file-leak-detector, I have been able to track the issue down the initFile step and more specifically the logic for selecting the sub-reader.

Using the following minimal screen example

sbesson@ls30630:bioformats (screenreader_fileleak) $ ls /tmp/data/screen/
A01.tiff    A02.tiff    minimal.screen
sbesson@ls30630:bioformats (screenreader_fileleak) $ cat /tmp/data/screen/minimal.screen 
[Plate]
Rows = 1
Columns = 2
Fields = 1

[Well 0]
Row = 0
Column = 0
Field_0 = A01.tiff

[Well 1]
Row = 0
Column = 1
Field_0 = A02.tiff

Running ant gen-config generates a configuration but ant test-automated fails early due unclosed file handle with the current HEAD of the master branch. With fe7b208 included, the tests are running and passing successfully except for testIsThisType which is a separate issue, probably not worth addressing as this reader is specific to the IDR fork.

sbesson commented 3 years ago

Initial testing in an IDR deployment, on test90 with a freshly redeployed server, looking at the open handles after opening a few wells of .screen datasets, typically idr0015 and idr0033 reveals hanging open file handles

[root@test90-omeroreadonly-1 fd]# pwd
/proc/18254/fd
[root@test90-omeroreadonly-1 fd]# ls -alh | grep formats-gpl.jar
lr-x------. 1 omero-server omero-server 64 Oct  7 13:21 62 -> /opt/omero/server/OMERO.server-5.6.0-ice36-b136/lib/server/formats-gpl.jar
[root@test90-omeroreadonly-1 fd]# ls -alh | grep idr00 | wc
    155    1705   48350

With a built version of formats-gpl.jar including this fix deployed on test90-omeroreadwrite, the same testing scenario seems to indicate file handles are closed

[root@test90-omeroreadwrite fd]# pwd
/proc/20970/fd
[root@test90-omeroreadwrite fd]# ls -alh | grep formats-gpl
lr-x------. 1 omero-server omero-server 64 Nov  4 11:41 62 -> /opt/omero/server/OMERO.server-5.6.0-ice36-b136/lib/server/formats-gpl.jar
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      0       0       0

Update: numbers are still growing. More investigation is required.

sbesson commented 3 years ago

Deployed on pilot-idr0090

manics commented 3 years ago

I trust @sbesson's assessment 😄

sbesson commented 3 years ago

Thanks @joshmoore for the suggestions. Last few commits should:

I will leave @pwalczysko to finish his testing with the first implementation of this PR and then build & deploy a new version of the JARs for retesting.

pwalczysko commented 3 years ago

Repeated the undesirable behaviour on idr-testing

[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
    191    2105   57836
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
    352    3876  108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
    352    3876  108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
    352    3876  108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
    352    3876  108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
    352    3876  108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
    352    3876  108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
    433    4767  134066

The same plate (one of idr0015) on pilot-idr0090 is showing

[root@pilot-idr0090-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      1      11     181
[root@pilot-idr0090-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      1      11     243
[root@pilot-idr0090-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      1      11     243
[root@pilot-idr0090-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      1      11     181
[root@pilot-idr0090-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      1      11     181
[root@pilot-idr0090-omeroreadwrite fd]# 

lgtm

sbesson commented 3 years ago

See https://github.com/ome/bioformats/pull/3634#issuecomment-726028499, the automatic closing of resources in FileStitcher.setReaderClassList is causing other issues in Bio-Formats so I am holding off on this and I will only make change to the concrete ScreenReader class.

Proposing to redeploy the latest version of Bio-Formats with this change to the pilot mentioned above so that we can confirm the reproducibility of the tests reported in https://github.com/IDR/bioformats/pull/23#issuecomment-724786021. Then I will cut a new IDR/Bio-Formats patch release and propose it for inclusion in prod90.

pwalczysko commented 3 years ago

Happy to retest if you please relist on Standup

sbesson commented 3 years ago

The latest version of this PR has been redeployed on pilot-idr0100 for a final round of testing @pwalczysko

pwalczysko commented 3 years ago

on pilot-idr0100

[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      0       0       0
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      0       0       0
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
     14     154    3892
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      0       0       0
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      3      33     913
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      2      22     630
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      0       0       0
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
      0       0       0

The rise aobve was caused by opening of some images from idr0015, but it went down quickly.

LGTM