Closed prioux closed 2 years ago
@prioux should it support cbcsv lists?
cbcsv files are replaced by their individual file components at that point, so there is no need to even implement anything particular.
I'm increasing the priority on this one because we'll need it for the UK BioBank processing.
I'm adding a requirement. To prevent a task from erasing an input that is also used by another task, make a check on the timestamp of the SyncStatus object.
After the setup() method, record a timestamp in the meta data of the task:
task.meta[:setup_time] = Time.now
then just before attempting the cleanup of the input file, fetch its SyncStatus object and compare with the timestamp recorded:
setup_time = inputfile.local_sync_status&.accessed_at || Time.now
if setup_time <= task.meta[:setup_time]
erase here
end
The accessed_at attribute is the one being updated whenever any process invokes sync_to_cache()
with some effort I managed to get exception (I guess restarting many tasks one after one)
This module would invoke 'cache_erase' on input files, once a task is complete, using a new specification in the descriptor:
That way when processing large datasets in parallel where the dataset is split in chunks (e.g. BidsSubjects of the UKBB), we can erase the subjec't data from the cache and free some disk space.