Kitware / UPennContrast

UPenn ?
https://upenn-contrast.netlify.com/
Apache License 2.0
8 stars 6 forks source link

Download annotations seems a bit slow #685

Open arjunrajlab opened 3 months ago

arjunrajlab commented 3 months ago

It seems that downloading annotations takes a bit longer than one would expect. Sounds like the way it currently works is by batching 100K annotations at a time, and it sorts ahead of time. Perhaps doing some indexing on annotation ID would help here? Or some other method that doesn't require sorting? Worth profiling.

bruyeret commented 3 months ago

I did some profiling and downloading 100k annotations takes 4 seconds on my machine

When profiling, it takes 5s, and most of it is spent in the decode_all function of the bson package @manthey can correct me if I am wrong but this function is called by mongodb which stores the annotations as BSON files and needs to decode them to python structures (dict, string, number...) I don't see a way of speeding up this function, and didn't find anything on internet about it either The other thing that takes some time is the conversion from a python dict to a JSON, but it is faster that the other conversion