Closed dvstans closed 4 years ago
Investigation of deletion of large collections found that the majority of time spent is due to loading each record in order to acquire data size for correction/update of per-allocation statistics. Apparently reading is slower than writing because it must be synchronous whereas writing does not.
The solution to this problem requires a refactoring of how background tasks are initialized. Currently, the init process does some potentially expensive processing which can cause client timeouts (as well as block other tasks from running). Instead, the task init stage will simply record what the client has requested in a new task (with no processing or blocking), and immediately return the task ID to the client. These initial tasks will then be run in the background where init processing will be performed (such as permissions checks, concurrency analysis), then the task will either fail or proceed to the next stage, which is either blocked or ready depending on concurrency with other tasks.
Update: There was no need (yet) to refactor the task code since all expensive operations can already be placed in the "run" function rather than the "init" function. However, for deletions, this leaves an opening for access to the items between init and completion of the task. Deletes were intentionally placed in the init with exclusive locks to ensure the operation was atomic. A possible work-around is to isolate all items somehow, such as removing ACLs and owner/creator fields, or by marking them in some way to prevent access. This may also be expensive however (in the init function).
Works as is but there is a small window of opportunity for negative interaction from other users (resulting in a task failure). Because this is improbable and of low impact (just retry operation), will close this issue.
Some operations like deleting a collection containing thousands of records take a very long time to run. This causes the operation to timeout on the client side. Need to identify potentially long-delay commands and break them into an initial confirmation part and a background task part to avoid these client timeouts.