Closed hepcat72 closed 3 months ago
As I was working on this, I realized that this strategy of heading off the ArchiveFile creation by trying to front-load the code with anything that can go wrong so as to not create files on the file system is not a scalable strategy. Any exception after the creation of an ArchiveFile record, and subsequent rollback, will cause files to be left behind.
I did improve the msrun_sequence code, but the best solution IMO is to use a ~post_delete~ transaction.on_commit signal/method in the ArchiveFile model class.
Just noting a specific case. If an mzXML
file is specified on the command line and sample record does not exist, the archive file is copied over to the archive directory.
AggregatedErrors Summary (2 errors / 0 warnings):
EXCEPTION1(ERROR): RecordDoesNotExist: Sample record matching the mzXML file's basename [blank1] does not exist. Please identify the associated sample and add a row with it, the matching mzXML file name(s), and the Sequence Name to the Peak Annotation Details sheet/file.
EXCEPTION2(ERROR): RecordDoesNotExist: Sample record matching the mzXML file's basename [blank2] does not exist. Please identify the associated sample and add a row with it, the matching mzXML file name(s), and the Sequence Name to the Peak Annotation Details sheet/file.
Traceback (most recent call last):
File "/var/www/tracebase/manage.py", line 22, in <module>
main()
File "/var/www/tracebase/manage.py", line 18, in main
execute_from_command_line(sys.argv)
File "/usr/local/tracebase/lib/python3.9/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
utility.execute()
File "/usr/local/tracebase/lib/python3.9/site-packages/django/core/management/__init__.py", line 436, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/tracebase/lib/python3.9/site-packages/django/core/management/base.py", line 412, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/local/tracebase/lib/python3.9/site-packages/django/core/management/base.py", line 458, in execute
output = self.handle(*args, **options)
File "/var/www/tracebase/DataRepo/management/commands/load_table.py", line 398, in handle_wrapper
raise self.saved_aes
File "/var/www/tracebase/DataRepo/management/commands/load_table.py", line 382, in handle_wrapper
retval = fn(self, *args, **options)
File "/var/www/tracebase/DataRepo/management/commands/load_msruns.py", line 136, in handle
self.load_data()
File "/var/www/tracebase/DataRepo/management/commands/load_table.py", line 423, in load_data
return self.loader.load_data(*args, **kwargs)
File "/var/www/tracebase/DataRepo/loaders/table_loader.py", line 2015, in load_wrapper
raise self.aggregated_errors_object
DataRepo.utils.exceptions.AggregatedErrors: 2 exceptions occurred, including type(s): [RecordDoesNotExist].
AggregatedErrors Summary (2 errors / 0 warnings):
EXCEPTION1(ERROR): RecordDoesNotExist: Sample record matching the mzXML file's basename [blank1] does not exist. Please identify the associated sample and add a row with it, the matching mzXML file name(s), and the Sequence Name to the Peak Annotation Details sheet/file.
EXCEPTION2(ERROR): RecordDoesNotExist: Sample record matching the mzXML file's basename [blank2] does not exist. Please identify the associated sample and add a row with it, the matching mzXML file name(s), and the Sequence Name to the Peak Annotation Details sheet/file.
Scroll up to see tracebacks for these exceptions printed as they were encountered.
$ ll /tracebasedev-archive/archive/archive_files/2024-06/ms_data/
total 172800
-rw-r--r--. 1 tracebase tracebase 75844569 Jun 4 16:21 blank1.mzXML
-rw-r--r--. 1 tracebase tracebase 77162873 Jun 4 16:21 blank2.mzXML
-rw-r--r--. 1 tracebase tracebase 729247 Jun 4 16:18 exp027f3_07.mzXML
-rw-r--r--. 1 tracebase tracebase 727344 Jun 4 16:18 exp027f3_08.mzXML
-rw-r--r--. 1 tracebase tracebase 712000 Jun 4 16:18 exp027f3_09.mzXML
-rw-r--r--. 1 tracebase tracebase 709946 Jun 4 16:18 exp027f3_10.mzXML
Just noting a specific case. If an mzXML file is specified on the command line and sample record does not exist, the archive file is copied over to the archive directory.
This is taken care of in #983, which has not yet been merged. You could work off that branch? Branch: msruns_loader_rollback_cleanup
.
FEATURE REQUEST
Inspiration
While fixing a bug in the
MSRunsLoader
, we ended up with archive files that were created and never linked to existing records. The theory is that these were created when there was a typo in the sequence information provided. The script failed when it tried to retrieve the sequence record and everything was rolled back. However, as we learned, deleting records (or rolling back from an atomic transaction), files created by FileField/FieldFile are not deleted.Description
We should check that the sequence records exist before creating any
ArchiveFile
records and skip their creation if it doesn't exist.Alternatives
Another option we discussed, which I personally think would be a good additional step to take (because there are other cases where a downstream exception unrelated to sequence records could still happen, cause a rollback, and leave vestigial files), would be to override the
delete()
method to clean up.Dependencies
None
Comment
Some notes from various shell commends when handling this issue:
console:
ISSUE OWNER SECTION
Assumptions
Limitations
Affected Components
Requirements
DESIGN
Interface Change description
None provided
Code Change Description
None provided
Tests