Closed sbesson closed 3 years ago
Initial testing in an IDR deployment, on test90
with a freshly redeployed server, looking at the open handles after opening a few wells of .screen
datasets, typically idr0015
and idr0033
reveals hanging open file handles
[root@test90-omeroreadonly-1 fd]# pwd
/proc/18254/fd
[root@test90-omeroreadonly-1 fd]# ls -alh | grep formats-gpl.jar
lr-x------. 1 omero-server omero-server 64 Oct 7 13:21 62 -> /opt/omero/server/OMERO.server-5.6.0-ice36-b136/lib/server/formats-gpl.jar
[root@test90-omeroreadonly-1 fd]# ls -alh | grep idr00 | wc
155 1705 48350
With a built version of formats-gpl.jar
including this fix deployed on test90-omeroreadwrite
, the same testing scenario seems to indicate file handles are closed
[root@test90-omeroreadwrite fd]# pwd
/proc/20970/fd
[root@test90-omeroreadwrite fd]# ls -alh | grep formats-gpl
lr-x------. 1 omero-server omero-server 64 Nov 4 11:41 62 -> /opt/omero/server/OMERO.server-5.6.0-ice36-b136/lib/server/formats-gpl.jar
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
0 0 0
Update: numbers are still growing. More investigation is required.
Deployed on pilot-idr0090
I trust @sbesson's assessment 😄
Thanks @joshmoore for the suggestions. Last few commits should:
try/finally
in openBytes
reader.close()
statement at the FileStitcher
level. I have also opened the FileStitcher
change upstream as https://github.com/ome/bioformats/pull/3634 to have this change tested more extensivelyI will leave @pwalczysko to finish his testing with the first implementation of this PR and then build & deploy a new version of the JARs for retesting.
Repeated the undesirable behaviour on idr-testing
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
191 2105 57836
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
352 3876 108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
352 3876 108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
352 3876 108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
352 3876 108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
352 3876 108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
352 3876 108551
[root@test90-omeroreadwrite fd]# ls -alh | grep idr00 | wc
433 4767 134066
The same plate (one of idr0015) on pilot-idr0090 is showing
[root@pilot-idr0090-omeroreadwrite fd]# ls -alh | grep idr00 | wc
1 11 181
[root@pilot-idr0090-omeroreadwrite fd]# ls -alh | grep idr00 | wc
1 11 243
[root@pilot-idr0090-omeroreadwrite fd]# ls -alh | grep idr00 | wc
1 11 243
[root@pilot-idr0090-omeroreadwrite fd]# ls -alh | grep idr00 | wc
1 11 181
[root@pilot-idr0090-omeroreadwrite fd]# ls -alh | grep idr00 | wc
1 11 181
[root@pilot-idr0090-omeroreadwrite fd]#
lgtm
See https://github.com/ome/bioformats/pull/3634#issuecomment-726028499, the automatic closing of resources in FileStitcher.setReaderClassList
is causing other issues in Bio-Formats so I am holding off on this and I will only make change to the concrete ScreenReader
class.
Proposing to redeploy the latest version of Bio-Formats with this change to the pilot mentioned above so that we can confirm the reproducibility of the tests reported in https://github.com/IDR/bioformats/pull/23#issuecomment-724786021. Then I will cut a new IDR/Bio-Formats patch release and propose it for inclusion in prod90
.
Happy to retest if you please relist on Standup
The latest version of this PR has been redeployed on pilot-idr0100
for a final round of testing @pwalczysko
on pilot-idr0100
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
0 0 0
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
0 0 0
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
14 154 3892
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
0 0 0
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
3 33 913
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
2 22 630
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
0 0 0
[root@pilot-idr0100-omeroreadwrite fd]# ls -alh | grep idr00 | wc
0 0 0
The rise aobve was caused by opening of some images from idr0015, but it went down quickly.
LGTM
Recent work on the Zarr HCS specification as well as recent submissions have demonstrated that we have a file leak in
ScreenReader
as resources are not been closed on access leading eventually toToo many open files
exceptions.Using the
file-leak-detector
, I have been able to track the issue down theinitFile
step and more specifically the logic for selecting the sub-reader.Using the following minimal screen example
Running
ant gen-config
generates a configuration butant test-automated
fails early due unclosed file handle with the current HEAD of themaster
branch. With fe7b208 included, the tests are running and passing successfully except fortestIsThisType
which is a separate issue, probably not worth addressing as this reader is specific to the IDR fork.