DigitalSlideArchive / digital_slide_archive

The official deployment of the Digital Slide Archive and HistomicsTK.
https://digitalslidearchive.github.io
Apache License 2.0
104 stars 49 forks source link

Can't enable large_image_source_pil to view PNGs #312

Open MartinKlefas opened 6 months ago

MartinKlefas commented 6 months ago

I've changed the docker-compose.yml file to uncomment ' - /opt/large_image/sources/pil/large_image_source_pil.egg-info' and looked through the girder.cfg and elsewhere to try to enable pil. I'm unfortunatley no closer and when I try to view the images in the histomicsui it still says: please select a "large image" item

After changing the config I've rebuilt the images, but I still have just the below plugins running: girder_1 | Loaded plugin "hashsum_download" girder_1 | Loaded plugin "homepage" girder_1 | Loaded plugin "jobs" girder_1 | Loaded plugin "worker" girder_1 | Loaded plugin "large_image" girder_1 | Loaded plugin "large_image_annotation" girder_1 | Loaded plugin "ldap" girder_1 | Loaded plugin "resource_path_tools" girder_1 | Loaded plugin "user_quota" girder_1 | Loaded plugin "virtual_folders" girder_1 | Loaded plugin "girder_xtk_demo" girder_1 | Loaded plugin "dicomweb" girder_1 | Loaded plugin "import_tracker" girder_1 | Loaded plugin "slicer_cli_web" girder_1 | Loaded plugin "histomicsui"

manthey commented 6 months ago

You shouldn't have to change the docker-compose file. There is a default maximum size that it will use for PNGs. There are some settings in Admin Console -> Plugins -> gear icon next to Large Image.

MartinKlefas commented 6 months ago

Thanks, that's working now, maybe the error message could be changed, as I thought I needed to add support for PNG files by including more plugins, rather than just increasing the max size setting. The problem has changed slightly though, as it looks like girder starts to load all of the images in a collection - I've got 700 images in that one collection, all 300mb PNG files, so within 15-30s it's used all my RAM, and doesn't let go!!

dgutman commented 6 months ago

Can you clarify how long girder seems to be holding on to those files... seconds or minutes? I've occasionally had memory leaks or file-handles not closing and it causing weird behavior, but it's been sporadic. I also don't have 700 300mb PNG's in a single directory...

On Thu, Jan 11, 2024 at 10:53 AM Martin Klefas-Stennett < @.***> wrote:

Thanks, that's working now, maybe the error message could be changed, as I thought I needed to add support for PNG files by including more plugins, rather than just increasing the max size setting. The problem has changed slightly though, as it looks like girder starts to load all of the images in a collection - I've got 700 images in that one collection, all 300mb PNG files, so within 15-30s it's used all my RAM, and doesn't let go!!

— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/digital_slide_archive/issues/312#issuecomment-1887463350, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTXCNXIKQUGLQ7LCZL3YOADHVAVCNFSM6AAAAABBPAZVYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXGQ3DGMZVGA . You are receiving this because you are subscribed to this thread.Message ID: <DigitalSlideArchive/digital_slide_archive/issues/312/1887463350@ github.com>

-- David A Gutman, M.D. Ph.D. Associate Professor of Pathology Emory University School of Medicine

MartinKlefas commented 6 months ago

Well, at the moment the memory usage just keeps going up until the server locks up and dies at 24gb. It starts raising in gb chunks when I go to the image selection screen in the histo plugin and even if I just rush through and pick the first one it keeps increasing.

I'll do a bit of digging this evening - reduce the number of images in the collection etc and see if the usage drops off after going up etc. I'll also check how much ram each image should be using just in case it's more than is available on that mini server.

manthey commented 6 months ago

You can adjust the maximum number of tile sources that can be cached. By default, it makes an estimate for how much memory different sources will use and tries to balance memory use and cache availability, but, it sounds like it is picking wrong. Does adding something like this to the girder.cfg file avoid the problem?

[large_image]
cache_tilesource_memory_portion = 16  # larger numbers use less memory
cache_tilesource_maximum = 32  # smaller numbers use less memory

This example will use the smaller of 32 sources cached or an estimate of the number of sources based on what it thinks would use 1/16th of the memory. What is a typical resolution of your PNG files? Probably if we sent the maximum allowable resolution higher than the default, the estimate is wrong for the PIL source.

dgutman commented 6 months ago

And we have not tested much if at all on giant pngs... so not completely shocked our defaults for PIL reading gigantic PNGs is not optimal.

On Thu, Jan 11, 2024, 11:17 AM David Manthey @.***> wrote:

You can adjust the maximum number of tile sources that can be cached. By default, it makes an estimate for how much memory different sources will use and tries to balance memory use and cache availability, but, it sounds like it is picking wrong. Does adding something like this to the girder.cfg file avoid the problem?

[large_image] cache_tilesource_memory_portion = 16 # larger numbers use less memory cache_tilesource_maximum = 32 # smaller numbers use less memory

This example will use the smaller of 32 sources cached or an estimate of the number of sources based on what it thinks would use 1/16th of the memory. What is a typical resolution of your PNG files? Probably if we sent the maximum allowable resolution higher than the default, the estimate is wrong for the PIL source.

— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/digital_slide_archive/issues/312#issuecomment-1887507495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTRLAXBGX77ATKOWNJ3YOAGCBAVCNFSM6AAAAABBPAZVYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBXGUYDONBZGU . You are receiving this because you commented.Message ID: @.*** com>

MartinKlefas commented 6 months ago

I've done some testing:

  1. I reduced the folder sizes and once the folder has had thumbnails generated, it releases the memory - but only when all the images have thumbnails generated, so if I have one more image than will fit into memory, then things start to break
  2. The second time I go into a folder browser if it's not crashed it doesn't use all that memory again - which is brilliant.
  3. Once I open an image after the folder browser has finished it releases the memory - but if it was still loading the thumbnails it continues that process until it's loaded them all.

Finally: Changing those tilesource settings down dramatically really helped the process of creating the thumbnail images and allowed the folders to finally fully cache. The memory portion had less of an effect than the maximum setting, and I had to set it quite aggressively lower than the memory limit on the machine to get it to stay in bounds. In the end I found 12 (out of 23 available on launching the app) made sure it stayed under about 14 GB of use - where 20 or 16 each meant that it sometimes got uncomfortably close to 99% RAM use. Setting the max sources made it stick strictly to x number of images scanned at once, which was slower, but more reliable when i set it to 2 - as I knew for certain the RAM use would stay low. Of course this would be less of an issue on a more powerful computer. Having said all that though, it doesn't seem to always drop the RAM when you pick which image you want to look at after scanning a folder. Here for instance I just

loaded 50 odd images for the first time in the folder thumbnail view, and picked one at random to look at:Screenshot 2024-01-11 222132 oddly this time it's picked bash to assign the memory use to - before it said Girder... If I shut everything down, load up the same interface, and pick a new image to review, the memory usage is back to normal, as it's not had to make any thumbnails: image

dgutman commented 6 months ago

So the good news is we cache thumbnails after the initial read... but frequently there's an initial "hit" when browsing a new directory until things are cached. I've noticed similar glitches in the matrix before, but probably not to the degree you noticed, as we have primarily set our defaults based on the type of image collections we typically use internally. ( Aperio primarily).

I personally find it annoying / problematic in this scenario when I have to stop / restart girder in order to make the system responsive again, but it's been very sporadic when this behavior happens, so we haven't invested a lot of effort trying to replicate it.

On Thu, Jan 11, 2024 at 5:25 PM Martin Klefas-Stennett < @.***> wrote:

I've done some testing:

  1. I reduced the folder sizes and once the folder has had thumbnails generated, it releases the memory - but only when all the images have thumbnails generated, so if I have one more image than will fit into memory, then things start to break
  2. The second time I go into a folder browser if it's not crashed it doesn't use all that memory again - which is brilliant.
  3. Once I open an image after the folder browser has finished it releases the memory - but if it was still loading the thumbnails it continues that process until it's loaded them all.

Finally: Changing those tilesource settings down dramatically really helped the process of creating the thumbnail images and allowed the folders to finally fully cache. The memory portion had less of an effect than the maximum setting, and I had to set it quite aggressively lower than the memory limit on the machine to get it to stay in bounds. In the end I found 12 (out of 23 available on launching the app) made sure it stayed under about 14 GB of use - where 20 or 16 each meant that it sometimes got uncomfortably close to 99% RAM use. Setting the max sources made it stick strictly to x number of images scanned at once, which was slower, but more reliable when i set it to 2 - as I knew for certain the RAM use would stay low. Of course this would be less of an issue on a more powerful computer. Having said all that though, it doesn't seem to always drop the RAM when you pick which image you want to look at after scanning a folder. Here for instance I just

loaded 50 odd images for the first time in the folder thumbnail view, and picked one at random to look at:Screenshot.2024-01-11.222132.png (view on web) https://github.com/DigitalSlideArchive/digital_slide_archive/assets/12558370/78330d56-5524-4f46-bcde-7a47f941d004 oddly this time it's picked bash to assign the memory use to - before it said Girder... If I shut everything down, load up the same interface, and pick a new image to review, the memory usage is back to normal, as it's not had to make any thumbnails: image.png (view on web) https://github.com/DigitalSlideArchive/digital_slide_archive/assets/12558370/071d0df7-f236-492f-aaeb-dc8d4eea5b33

— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/digital_slide_archive/issues/312#issuecomment-1888066807, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTTNSB6EBNHMRCDOXH3YOBRFPAVCNFSM6AAAAABBPAZVYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGA3DMOBQG4 . You are receiving this because you commented.Message ID: @.*** com>

-- David A Gutman, M.D. Ph.D. Associate Professor of Pathology Emory University School of Medicine

MartinKlefas commented 6 months ago

Yes, and I think with "proper" formats the creation of the thumbnails is much more memory efficient too - as the files contain thumbnails and don't have to be fully read in?

Anyway it's kind of solved, in that I know how to side step the issue. If you want me to do any other testing let me know!

dgutman commented 6 months ago

So you are indeed correct-- in most cases we do NOT have to read the entire file to generate a thumbnail.. Most of the PNG data sets I have are so small that the overhead of reading the entire 1-2K images and generating a small image was not really noticeable..

On Thu, Jan 11, 2024 at 6:02 PM Martin Klefas-Stennett < @.***> wrote:

Yes, and I think with "proper" formats the creation of the thumbnails is much more memory efficient too - as the files contain thumbnails and don't have to be fully read in?

Anyway it's kind of solved, in that I know how to side step the issue. If you want me to do any other testing let me know!

— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/digital_slide_archive/issues/312#issuecomment-1888105629, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTS5MBBA2QTF53RVPJLYOBVRNAVCNFSM6AAAAABBPAZVYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGEYDKNRSHE . You are receiving this because you commented.Message ID: @.*** com>

-- David A Gutman, M.D. Ph.D. Associate Professor of Pathology Emory University School of Medicine

MartinKlefas commented 6 months ago

Thanks - the uniqueness of this dataset is really coming to light the more I work with it. Now that I've generated most of the thumbnails the system is working really well - I don't know if it's possible to have a "generate all thumbnails now" button somewhere in the settings for other poor unfortunates with an unusual dataset?

Thanks again

dgutman commented 6 months ago

Good news--- this isn't our first rodeo as they say. Check out this endpoint... it runs in the background and can pre-cache / generate tiles. The specifications are in the description.. just copy that spec or some variant thereof : [{"width": 160, "height": 100}, {"width": 160, "height": 100, "imageKey": "macro"}, {"width": 160, "height": 100, "imageKey": "label"}]

Into this "spec" box... for your use case, you can probably just pass. : [{"width": 160, "height": 100}]. since your images most likely do not have macro or label images. [image: image.png]

On Fri, Jan 12, 2024 at 9:56 AM Martin Klefas-Stennett < @.***> wrote:

Thanks - the uniqueness of this dataset is really coming to light the more I work with it. Now that I've generated most of the thumbnails the system is working really well - I don't know if it's possible to have a "generate all thumbnails now" button somewhere in the settings for other poor unfortunates with an unusual dataset?

Thanks again

— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/digital_slide_archive/issues/312#issuecomment-1889432023, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTSIPBRTY6RAG3F37HLYOFFKXAVCNFSM6AAAAABBPAZVYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBZGQZTEMBSGM . You are receiving this because you commented.Message ID: @.*** com>

-- David A Gutman, M.D. Ph.D. Associate Professor of Pathology Emory University School of Medicine

MartinKlefas commented 6 months ago

Sorry i think something was lost in updating the issue by email - it's come through with just image markup and not images

MartinKlefas commented 3 months ago

Hi - I've tried repeating this process on a new (better) server and unfortunately I can't find the setting I found before that made it work. Has the UI changed a bit since January? All I can find is the "Maximum size of regular images to use without conversion" here: image And now Histomics UI doesn't like my PNG files anymore!

manthey commented 3 months ago

There haven't been substantial changes in the last few months. The two settings are still there: if your PNGs are larger than the "Maximum size of regular images to use without conversion", then you'll still have to manually ask to use large PNGs as "large images". Did you have this set larger in your previous version? And, changing this value won't change items already imported, only newly uploaded/imported.

MartinKlefas commented 3 months ago

Sorry if this is incredibly obvious - how do I manually ask to use large PNGs as "large images" My understanding is that if I do this, then Large Image will make tiles of my large PNG files etc - is that correct?

If I set that number very high though, it'll just open them as regular images (but not "large images") and save a thumbnail of them in the explorer?

manthey commented 3 months ago

On an item page, there is a button to use a file as a large image: image If the file can be read directly (e.g., is internally a tiled format or is below the setting's size threshold), it will be served as is. If not, a tiled tiff file is generated based on the file and stored alongside it, and it is served from there.

The drawback to a high maximum size is that (a) it can use a large amount of memory, since the pngs will be decoded into memory for serving, and (b) if they exceed your browser's GPU's texture buffer memory, they become very slow to view. The benefit is that we don't have to convert them to a tiled tiff and use more memory.

dgutman commented 3 months ago

Also.. if you are familiar with the girder_client (python module), or just love using pure REST requests..

You can loop through the folder / collection/ whatever and do a POST request

curl -X 'POST' \ 'https://SERVER_API/api/v1/item/#itemid#/tiles?force=false&n' \

you can do a gc.post(f"item/{itemId}/tiles?force=false") The force=false prevents it from generating a tiled image if the image is readable.

I normally do something like this

for img in gc.listItem(###SOMEGIRDERFOLD###): if 'largeImage' not in img: gc.post(f'item/{img["_id"]/tiles?force=false')

This prevents me from trying to "re" largeImage items that already are flagged properly, and is normally quite quick.

On Wed, Mar 20, 2024 at 2:29 PM David Manthey @.***> wrote:

On an item page, there is a button to use a file as a large image: image.png (view on web) https://github.com/DigitalSlideArchive/digital_slide_archive/assets/8781639/e8f3dcec-301b-495e-bc79-66e23290cba1 If the file can be read directly (e.g., is internally a tiled format or is below the setting's size threshold), it will be served as is. If not, a tiled tiff file is generated based on the file and stored alongside it, and it is served from there.

The drawback to a high maximum size is that (a) it can use a large amount of memory, since the pngs will be decoded into memory for serving, and (b) if they exceed your browser's GPU's texture buffer memory, they become very slow to view. The benefit is that we don't have to convert them to a tiled tiff and use more memory.

— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/digital_slide_archive/issues/312#issuecomment-2010322357, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTXSVZR4CFII4S6DW4TYZHIRTAVCNFSM6AAAAABBPAZVYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJQGMZDEMZVG4 . You are receiving this because you commented.Message ID: @.*** com>

-- David A Gutman, M.D. Ph.D. Associate Professor of Pathology Emory University School of Medicine

MartinKlefas commented 3 months ago

for img in gc.listItem(###SOMEGIRDERFOLD###): if 'largeImage' not in img: gc.post(f'item/{img["_id"]/tiles?force=false')

This worked brilliantly, thank you - though I did have to do a bit of a shallow dive on the girder client library so I could authenticate first!

dgutman commented 3 months ago

Aah yes.. gc.authenticate.. wasn't sure if you'd used the python girder client before..

On Wed, Mar 20, 2024, 5:16 PM Martin Klefas-Stennett < @.***> wrote:

for img in gc.listItem(###SOMEGIRDERFOLD###): if 'largeImage' not in img: gc.post(f'item/{img["_id"]/tiles?force=false')

This worked brilliantly, thank you - though I did have to do a bit of a shallow dive on the girder client library so I could authenticate first!

— Reply to this email directly, view it on GitHub https://github.com/DigitalSlideArchive/digital_slide_archive/issues/312#issuecomment-2010653099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFODTTBKKDHGAZA46PKIWLYZH4BFAVCNFSM6AAAAABBPAZVYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJQGY2TGMBZHE . You are receiving this because you commented.Message ID: @.*** com>