Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
152 stars 46 forks source link

Do not save custom images' binary data to database #4136

Closed northwestwitch closed 11 months ago

northwestwitch commented 1 year ago

When loading a custom image (both case images and variants images) the image itself gets saved as binary data, and it's a lot of data:

image

There is no need to do so, since also the path to the image on disk is saved in the database, and the image could be loaded on the pages on the fly perhaps - I don't think it would slow down a lot -

alkc commented 1 year ago

When working on #4100 I did a quick test to see how much of a difference it would make to omit custom_images from returned case objects and found that it made a considerable difference.

I don't have those numbers anymore, but I ran a rough test just now on how long it takes for our scout container (same instance as in #4100) to pull ~3600 cases with and without custom_images (via projection={'custom_images' : 0})

It takes about 17-15s to iterate over the cursor that returns all cases with custom images, and 11-12s without. That's a reduction by 28%, which is pretty good I think.

northwestwitch commented 1 year ago

When working on #4100 I did a quick test to see how much of a difference it would make to omit custom_images from returned case objects and found that it made a considerable difference.

I don't have those numbers anymore, but I ran a rough test just now on how long it takes for our scout container (same instance as in #4100) to pull ~3600 cases with and without custom_images (via projection={'custom_images' : 0})

It takes about 17-15s to iterate over the cursor that returns all cases with custom images, and 11-12s without. That's a reduction by 28%, which is pretty good I think.

That's a nice starting point for a speedup refactoring! I would actually do both things (omitting images when fetching the cases on case pages and not saving binary images into case docs)

dnil commented 1 year ago

We don't use the field in Solna as of yet. Your predecessor was a big fan of storing data on the db directly, which partly explains the design. I guess to avoid having to keep certain results on disk. If you are comfy with using paths instead, feel very free to suggest a change! Maybe we could do something backwards compatible, like fetch from a path for a str or try to show directly from BinData?

alkc commented 1 year ago

Interesting. I'm not in charge of scout here in Lund, but I can investigate if we absolutely need to store the images in the db.

northwestwitch commented 1 year ago

It could perhaps be serverd by a static endpoint by providing its path?

dnil commented 1 year ago

Sure, but you can also just do a dynamic one, or reuse an existing image serving dynamic endpoint. Compare to chromograph images, ideograms etc. scout.cases.views.host_image_aux(), scout.cases.views.host_chr_image(). Needs the filename passed along from loading and replacing the base64 encoding mechanism with a url_for('scout.cases.views.that_new_image_endpoint').

mhkc commented 1 year ago

We are ok with this change. I can recall why I stored the images in the database but we can solve any issues this is going to cause in our setup.