MonashBioinformaticsPlatform / laxy

Laxy Genomics Pipelines
Apache License 2.0
3 stars 1 forks source link

File indexing setting some existing File locations to None/null #226

Open pansapiens opened 2 years ago

pansapiens commented 2 years ago

Observed behaviour

In the case where files are registered in bulk via laxy_backend.views.JobFileBulkRegistration.post (eg a manifest.csv file), if two calls are made to this function for the same job the set of registered files from the first call ends up with the location set to None, the set from the second call has valid locations set.

In the case where a subset of files have no location set, running the indexing task job_tasks.index_remote_files (eg, manually triggered via admin interface) seems to result in a 'swap' - the files without a location have their location set, and those with a location end up with None.

This has tended to only occur when registered files have been 'amended' after job completion, but could impact scenarios where you want to register files at different stages of the pipeline while the job is running (eg making the QC reports available in the early stages of a pipeline, or continuous file registration by polling / inotify).

Expected behaviour

Indexing files, either via laxy_backend.views.JobFileBulkRegistration.post or job_tasks.index_remote_files, should never result in Files with a valid location being modified to None.

If a File exists (by job+path+name) during laxy_backend.views.JobFileBulkRegistration.post and the request contains a location field for that File, the location should be added to the File.locations list (and made the default location, based on query string option default=true ?).

If a File exists during a job_tasks.index_remote_files task (by job+path+name), the location should be added to the File.locations list (and made the default location, depending on arguments provided to index_remote_files ).

job_tasks.index_remote_files should maintain existing type_tags on a File. laxy_backend.views.JobFileBulkRegistration.post should replace type_tags if a File record of the same path and name for that Job exists.