Images generated from an original non-png image do not open on production

kyrieb-ekat commented 1 month ago

Okay this is a little bit of a weird one that I'm still trying to find the exact context for but so far here's what I've realised:

When a jpg or tiff image is uploaded as a resource and automatically typed by rodan as its file, you can then add it as an import to the png job, which automatically converts it to png for jobs requiring a png input, such as pixel, classifying, and the interactive classifier. The separated layers and images meant to show up in the pixel editor and IC are the png, by the typing process of the generated resource. However, once a classifying run is complete and the separated layers produced as png outputs, they do not open and either show a 'resource does not exist' page (as always with classification jobs in this context), or a server 500 error (as in pixel images).

Thus far it appears that, even with the conversion job, images which are not png, even if they produce a png, mean their entire workflow can't be viewed. Separated layers can be downloaded and open successfully. Resources which were uploaded as pngs separate and represent with no issue.

I don't think this is necessarily a blocker issue, but given that manuscript images are very often produced and uploaded as jpg or jpeg this could be significant later on. Alternatively, we can just only allow png's.

Tiff images, even when 'typed' as a image/rgba+tiff, will not open when you click 'view' regardless. Jpg's do open as a resource when you click view, they just don't seem to load once converted.

kyrieb-ekat commented 4 weeks ago

I have replicated this issue on staging. It appears that if an images was ever a png it has issues displaying and separating.

fujinaga commented 4 weeks ago

What image format is it expecting?

kyrieb-ekat commented 4 weeks ago

The png (rgb) conversion job will take both jpg and jpeg images as inputs. It does successfully produce a rbg+png version of the image, but it can't open it in the pixel job. When I exit the job and check the produced resources from a job, the png version of the image exists and opens without a 500 error.

kyrieb-ekat commented 3 weeks ago

I believe I have found the root cause of part of this issue: very large images appear to consume a ton of memory and affect model performance; I speculated with @homework36 who mentioned this would possibly lend to image corruption, or some of the other weird problems we were seeing.

This doesn't address the 500 error that sometimes appears when a jpg is uploaded to the RGB (PNG) job and then connected to an image input port, but I strongly suspect this is also image size related.

I have found that images which exceed around 4000x6000 cause tremendous corruption/problems in the separation stage, and don't open at all in pixel (or take an exceptionally long time to do so). I reduced sample images from throughout MS73 by 30% (from 6000x9000 to 4000x6000 ish) and they separated as expected- the images do open, if a little slowly, but everything is looking a lot more like it should.

I'm definitely going to experiment a little more to find the exact point of no return and then add it to the documentation.

kyrieb-ekat commented 3 weeks ago

Here are some screenshots of three layer 3's, which are supposed to be the text layer; the first two are the corrupted/incorrect layers, the third is what we're expecting. Significantly, the size of the corrupted files are over 20MB, when they are usually equal to or less than 1MB. Screenshot 2024-08-15 at 11 50 24 AM Screenshot 2024-08-15 at 11 48 06 AM Screenshot 2024-08-15 at 11 47 42 AM

fujinaga commented 3 weeks ago

What is the actual physical dimension of this manuscript?

kyrieb-ekat commented 3 weeks ago

The page dimensions range, per Cantus Database, from 465 x 335 mm to 455 x 320 mm, with the cover slightly larger.

fujinaga commented 3 weeks ago

What were the typical size, in pixels, of Salzinenes and Einsie?

kyrieb-ekat commented 3 weeks ago

Images in the manifest I found look to average around 7500x9500 for Salzinnes, though they're also only around ~5MB, as for Einsie what I could find looks like they run around 4872 × 6496 (@JoyfulGen you probably have more details on this!).

I do know that both Salzinnes and Einsie needed to go through a conversion process (originally JPEG, I think?). I would be curious if the original Salzinnes images processed alright at that size- if so then maybe it's an image size thing more than a pixel area thing.

JoyfulGen commented 3 weeks ago

Ok, somewhere in the very back of my mind a memory has awakened that when we ran the Salzinnes workflow we had an Image Resizing job? And then for Einsie I was told that I didn't need it because the images were already the right size? I remember Martha being the one to organize/explain this to me; is she still around?

ahankinson commented 3 weeks ago

JPEG or JPEG2000? The latter is much more resource intensive to work with, and most libraries are really bad at support for it.

kyrieb-ekat commented 3 weeks ago

I went and peeked in the repository and there is a resize job and conversion job for Salzinnes; Marta is around via her new email, I think!

https://github.com/DDMAL/Rodan/blob/e5f620dcfac55721a858ddbec81d85f73bc22dbe/rodan-main/code/helper_scripts/convert_image.py#L30 https://github.com/DDMAL/Rodan/blob/e5f620dcfac55721a858ddbec81d85f73bc22dbe/rodan-main/code/rodan/jobs/pil_rodan/resize.py#L6

Interestingly it looks like they grappled with huge image sizes too! I have been using Image Magick's resize ability, though it can only affect pictures in a directory on my computer, or on an external drive probably?

kyrieb-ekat commented 3 weeks ago

@ahankinson it appears Einsie was just JPEG, and not JPEG2000.

kyrieb-ekat commented 2 weeks ago

I have further figured out some of the issues I encountered in subsequent trainings. When the training inputs are also too large, but not large enough to exceed the memory container enough to kill the job, it corrupts the image. I reran trainings by removing and adding zips, and changing the combinations of ZIP (sample) inputs and a group of models as inputs. Utilising a ZIP sample produced from a resized image and a full group of previous models (e.g., model 1-3 + background model) produced expected layer separation. Similarly, utilising only two ZIP samples and no model inputs produced the expected results. All previous failed models either used large images within the zips, or two samples and a group of models. In one case two zips and a group of models worked as expected. Further testing is needed to figure out at which point the ZIP sizes produced replicate this issue.

However, given that the artifacting and 'mixing' of layers present in these bizarre separations where images were sized 'correctly' from previous findings are very similar to those same memory related issues, I strongly suspect that this is still a memory related issue. I'm going to continue to experiment with this, but given I've been able to get an expected separation from these reduced size trainings I'm approaching cautiously optimistic.

Thoughts for the future/my future self is whether if I download pixel zips and model files to check their sizes the sizes are accurate, if that makes sense/isn't too ridiculous sounding (... the rodan paranoia is possible getting to me). If so maybe we can start honing in on an exact threshold of total training input size before we start seeing corruption/weirdness.

kyrieb-ekat commented 1 week ago

The original thing this issue was opened to address was figured out- I'm going to close this issue and open a more specific issue for further experimentation and documentation.

DDMAL / Rodan

Images generated from an original non-png image do not open on production #1196