Quasars / orange-spectroscopy

Other
51 stars 58 forks source link

[FIX] Less memory use with visible images (reimplemented deepcopy) #726

Closed markotoplak closed 1 month ago

markotoplak commented 1 month ago

@clsandt reported running out of memory when performing PCA on Opus data with large visual images.

That was caused by the .attributes dictionary on a table that contained actual visual images, and that dict was deepcopied for each table transformation. Thus, memory got filled up quickly.

This PR now uses a subclass of VisibleImage that does not actually deepcopy the image data on copying and just reuses the same references.

This is my preferred alternative to #725, because then saving data in .tab or .pickle formats keeps the images working; images actually get saved into .metadata file (or directly into .pickle).

markotoplak commented 1 month ago

@clsandt, could you quickly look at this one if I did not break something? Thanks!

borondics commented 1 month ago

I tried it, the visual image is there. Not sure if it broke anything else.

markotoplak commented 1 month ago

Thanks Feri! Oh, I meant to ask @stuart-cls to look at the code. :)

stuart-cls commented 1 month ago

@markotoplak Thanks, I like this approach better for the same reasons as you. This implementation looks great.

I was thinking about this interface when I was working on the Orange HDF5 format, and have some thoughts:

  1. This is a great opportunity to standardize the visible_images interface. I was thinking to move the agilent implementation to store bytes instead of file references so that visible images could be stored in saved files. We should do that in this PR, I can add that part if you would like.
  2. As those are the only 2 producers of visible_images, the only remaining requirement for keeping the old code if statements is previously saved .metadata sidecar files. Not sure how to handle those.
  3. The VisibleImages class is nice for defining behaviour, is there a way we could structure it so that it can be serialized without pickle? The current Orange.data.io_base.write_table_metadata will only do that if the values are strings, but there is also the literal implementation for .ows files which is a bit more permissive.
markotoplak commented 1 month ago

@stuart-cls, thanks for the review.

I forgot about the fact that Agilent also reads visual images. Yes, that one should be fixed. If you have any time for that, please, do it.

For now, I would core about serializing it. And I'd keep the backward compatibility in HyperSpectra for at least a while - I think this is nice to do in case someone has custom readers.

stuart-cls commented 1 month ago

@markotoplak Done. As a bonus, visible images from agilent files now are saved in the .metadata sidecar files.

markotoplak commented 1 month ago

@stuart-cls, thanks!