DigitalSlideArchive / digital_slide_archive

The official deployment of the Digital Slide Archive and HistomicsTK.
https://digitalslidearchive.github.io
Apache License 2.0
108 stars 49 forks source link

Bug with large WSI files #205

Closed ds2268 closed 2 years ago

ds2268 commented 2 years ago

When you upload WSI files larger than 4GB+ (probably larger than 2**32) the girder crashes and restarts. I used girder client API to upload it.

girder-client --api-url ${API_URL} --api-key ${API_KEY} upload $DEST_FOLDER_ID ${case_dir}

case_dir contains WSI files and I noticed that I only have problems with 2 WSIs that are 4.5 - 5GBs. The files are not corrupted as they are opened without a problem in an NDPI viewer from Hamamatsu. I guess there is a limit somewhere in the architecture to 32bits?

manthey commented 2 years ago

We work with WSI files that hundreds of GB. However, we do use the OpenSlide library to read NDPI files, which may have issues with some files (since it isn't Hamamatsu's own software). Can you share one of the files that isn't working? I can send you a url to a place where you could upload it.

ds2268 commented 2 years ago

Sure. I have shared one example with you over the email that you have provided in your Github profile.

manthey commented 2 years ago

Thank you. I see that the issue has to do with ndpi files not being valid tiff files when they exceed 4Gb, but we are trying to parse them as such to get some extra information. This exhausts memory. I'll look into what component is doing that and how to fix it.

dgutman commented 2 years ago

I probably have some additional large NDPI files we can also test with... maybe this is one of the weirder issues we've been dealing with. Most of my servers have 128GB of memory, so in my case it may take opening a couple of these files before things get flaky...

On Mon, Apr 11, 2022 at 12:50 PM David Manthey @.***> wrote:

Thank you. I see that the issue has to do with ndpi files not being valid tiff files when they exceed 4Gb, but we are trying to parse them as such to get some extra information. This exhausts memory. I'll look into what component is doing that and how to fix it.

— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/digital_slide_archive/issues/205#issuecomment-1095296441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTVSJ6VPAGQMQHDRFX3VERJ5LANCNFSM5TASE6SQ . You are receiving this because you are subscribed to this thread.Message ID: <DigitalSlideArchive/digital_slide_archive/issues/205/1095296441@ github.com>

-- David A Gutman, M.D. Ph.D. Associate Professor of Neurology Emory University School of Medicine

ds2268 commented 2 years ago

The machine that I tested also got restarted on one occasion and it has 128GB. I couldn't find from the logs why it happened - OOM comes in different shapes and forms usually in logs, but it did exactly when I was pushing this exact NDPI file. On later occasions, only girder was crashing and restarting. Girder restarting happened each time when I was trying to upload this NDPI file via girder-client bash API.

dgutman commented 2 years ago

So there is some memory leak (I think)... that I've been trying to track down for a long time. It usually happens when I try and load/parse a huge number of NDPI files, some of which are probably larger than 4GB. I have a smattering of SVS files that also likely contribute... but it tends to happen sporadically and I haven't tried to systematically nail it down. I just set up a new server that I plan to use to try and figure this out, assuming it's the same issue.

On Mon, Apr 11, 2022 at 1:20 PM Dejan Štepec @.***> wrote:

The machine that I tested also got restarted on one occasion and it has 128GB. I couldn't find from the logs why it happened - OOM comes in different shapes and forms usually in logs, but it did exactly when I was pushing this exact NDPI file. On later occasions, only girder was crashing and restarting. Girder restarting happened each time when I was trying to upload this NDPI file via girder-client bash API.

— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/digital_slide_archive/issues/205#issuecomment-1095324833, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTWMUTW2UU2V7VMVMJDVERNN3ANCNFSM5TASE6SQ . You are receiving this because you commented.Message ID: @.*** com>

-- David A Gutman, M.D. Ph.D. Associate Professor of Neurology Emory University School of Medicine

ds2268 commented 2 years ago

I have access to ~50 NDPI WSIs scanned under 40x if NDPIs are in rare supply to test it out and can test it out on my side when/if needed, but just 2 of them are > 4GB and both are causing the same problems. I have shared one with @manthey, can also share the other one.

dgutman commented 2 years ago

Hah--- no I have hundreds of them. We are fortunately not data poor... Just tracking down random memory leaks hasn't been the biggest fire I've been dealing with, but I've been setting aside some files that have caused weird issue/girder restarts in the past. I am setting a up a new test environment, so can hopefully try and make GIRDER more gracefully deal with some of these issues. I think as @David Manthey @.***> said, the bugs are largely in some of the external libraries we use, but we can probably figure out how to detect/catch/prevent them ..

On Mon, Apr 11, 2022 at 1:31 PM Dejan Štepec @.***> wrote:

I have access to ~50 NDPI WSIs scanned under 40x if NDPIs are in rare supply to test it out and can test it out on my side when/if needed, but just 2 of them are > 4GB and both are causing the same problems. I have shared one with @manthey https://github.com/manthey, can also share the other one.

— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/digital_slide_archive/issues/205#issuecomment-1095335530, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTTLUIYCVZN2LEEDPB3VEROYZANCNFSM5TASE6SQ . You are receiving this because you commented.Message ID: @.*** com>

-- David A Gutman, M.D. Ph.D. Associate Professor of Neurology Emory University School of Medicine

manthey commented 2 years ago

NDPI files are almost tiff files. But, they always stored IFD offsets as 64-bit values, even though they claim to be non-bigtiff files. And, directory data offsets are still 32-bit values, which means that they high 32 bits has to be inferred. OpenSlide does this a particular way.

Specifically, we parse the data from TIFF files to find additional information (for instance, in an attempt to present all such data to a user for review to ensure there is no PHI). That parser is what is blowing up, since the second IFD if the file isn't where the non-bigtiff parser expects it.

I'll have a fix shortly.

manthey commented 2 years ago

This should be fixed. The latest built dockers should be updated, too. The specific issue was addressed here: https://github.com/DigitalSlideArchive/tifftools/pull/64. Please reopen or create a new issue if needed.

ds2268 commented 2 years ago

I can confirm that this now works. Thanks!