Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

`StudyLoader` for submission refactor #1013

Closed hepcat72 closed 2 months ago

hepcat72 commented 3 months ago

Summary Change Description

Implemented a study loader that is the loader class equivalent of the old load_study.py, but it runs all the new loaders instead.

I haven't implemented a command line script yet. Not sure if I will. RN, I'm focussed on the user experience of the submission process. That involves everything in the study sheet and all the peak annotation files. It doesn't support mzXML files, nor should it. users cannot upload those. But the msruns loader can be run at any time to load them, so that's not a big deal. Once we're at the point where users can easily create their submissions, then I can focus on the differences we want for ourselves during the load process (e.g. add new compounds or whatever to consolidated files). So the next effort is to update the excel doc that's returned in the build a submission web interface to flesh out all the associated data that can be derived from a submitted peak annotations file.

I overloaded TableLoader here, which is why I removed the model requirement in the class attributes of TableLoader. Instead of "headers", it sets sheet names. I didn't go to the trouble of renaming the variables used by TableLoader, so just be aware during review that references to headers are actually sheets. I try to make that clearer with comments and wrapper methods, but the attributes will look confusing.

There was just too much benefit to using TableLoader to not use it in this loader.

One thing to note is, load_study.py allowed MaintainedModel's coordinator to be overridden by intercepting the kwargs and looking for argument names that allow you to change the mode of the MaintainedModelCoordinator. For example, you don't want to run buffered autoupdates if you're in dry run or validate mode, so it used a wrapper to inspect the arguments coming in and dynamically change it's mode based on those arguments. That worked fine for decorating the handle method of a command line script, but the load_data method inside TableLoader doesn't have any arguments other than self, so I added a way for it to see if self has an attribute by configurable names (in the application of the decorator - using the same mechanism) to check its value and thereby affect the coordinator mode (e.g. disable deferred updates in dry run or validate mode.

PS - There also needs to be a refactor that removes all the old loaders and scripts once the process migrates to the new pipeline.

Affected Issues/Pull Requests

Review Notes

See comments in-line.

Checklist

This pull request will be merged once the following requirements are met. The author and/or reviewers should uncheck any unmet requirements:

hepcat72 commented 2 months ago

Merged via #1076