bihealth / sodar-server

SODAR: System for Omics Data Access and Retrieval
https://github.com/bihealth/sodar-server
MIT License
14 stars 3 forks source link

Refactor redundant queries in IrodsAPI.get_objects() #1883

Closed mikkonie closed 8 months ago

mikkonie commented 8 months ago

While looking into #1872 and #1882, I've stumbled across another rabbit hole of yak shaving.

In IrodsAPI.get_objects(), we unnecessarily perform multiple queries if .md5 files and/or collections are to be included in the return data. With collections containing a lot of data objects or subcollections, this can unnecessarily slow up things. I have observed this causing problems in real life with a large project.

Alas, refactoring this also involves changing the signature of get_objs_recursively(). I'll have to look into uses of these API methods and figure out the optimal solution without making things too convoluted.

mikkonie commented 8 months ago

I have refactored away the redundant queries for .md5 files. However, the collection query remains separate, as we do the query with PRC instead of direct admin SQL. For the reasons for that, see #1440.

mikkonie commented 8 months ago

Done, although the separate query for collections remains (see above).