bihealth / sodar-server

SODAR: System for Omics Data Access and Retrieval
https://github.com/bihealth/sodar-server
MIT License
14 stars 3 forks source link

Refactor usage of get_subcoll_obj_paths() #1882

Closed mikkonie closed 5 months ago

mikkonie commented 5 months ago

While looking into #1872, I noticed that landing_zone_move uses the old get_subcoll_obj_paths() helper originally added (IIRC) in the old sodar_taskflow service.

This method of iterating through subcollections is very inefficient and it is the reason why we use admin SQL queries instead of walk() in IrodsAPI.

The use of this should be replaced by appropriate calls to IrodsAPI.get_objects(). The helper can then be removed for good.

I'm not sure if this is the root cause for #1872. That seems to be caused by an iRODS timeout, but would that happen if we spent a long time traversing Collection.subcollection:s? In any case, doing this should at the very least speed up landing zone jobs for zones with a lot of subcollections and get us rid of an inefficient helper.

mikkonie commented 5 months ago

Done. The redundant helper methods were removed.