bihealth / sodar-server

SODAR: System for Omics Data Access and Retrieval
https://github.com/bihealth/sodar-server
MIT License
14 stars 3 forks source link

Optimize landing zone creation with restrict collections enabled #1969

Open mikkonie opened 4 months ago

mikkonie commented 4 months ago

It is no secret that creating landing zones with both collection creation and restriction can be very slow for large assays (500+ samples).

The reason is that we create collections and apply ACLs individually for each collection. This scales linearly, so it can take quite a bit of time for a lot of collections.

I would like to think there is a way to do this without individual ACLs for every collection, while preserving no write access to user in the parent collection (that's the "restrict" part). However, I don't know how to implement it off the top of my head. Maybe disabling inheritance from parent collection after the operation?

Also, I need to double-check the current code for setting the ACLs. It is possible I have accidentally introduced some bottlenecks in the most recent changes to this, which I believe happened when updating taskflows for iRODS 4.3 API compatibility.

I will need to experiment. Pull requests are of course welcome from iRODS experts. Current tests should ensure the desired end result..

Workaround: Disable restrict collections and be careful about where you upload :)