Closed lparsons closed 3 years ago
Note: this will likely require and database flush for these migrations, if PeakGroupSet is required
@lparsons @hepcat72 @fkang-pu Regarding filename
, are there any constraints to this table? ( i.e. is filename unique ?). Asking because it is easy to add a single/cached PeakGroupSet for a single run of load_accucor_msruns.py
, but if you run it multiple distinct times with different peak sets but potentially the same filenames, things might be a little tricky to interpret afterwards. Just asking for a first-pass specification, because load_accucor_msruns.py
wasn't really coded with the "add new peak groups to previously loaded/inferred msruns" mindset (#65).
I guess I will just insert 1, no matter the name, and hope for the best
I don't see a big concern right now about constraints. I think ultimately, we'd want to store that file somewhere for provenance.
As for load_accucor_msruns.py
, I would probably make the assumption that the filename was unique and do a get_or_create
using the filename as a lookup. At least, to me, that would work for a while. Please let me know if you have an alternate suggestion though, I think you've probably thought about this more deeply than I at this point.
It is just that with the imported_timestamp
presumably defaulting to "now" in the model (e.g. this load_accucor_msruns.py
execution), then we should proabably never "get"/retrieve a previous one, because that was a different time and potentially a different file (albeit with a shared name).
Good point @jcmatese. Would it make you more comfortable to add the unique constraint on the filename? That seems reasonable to me.
Nah, that is ok, as long as everyone is ok with the interpretation (always report as "this file at this date time")
It's a good idea to add unique constraint on the filename.
OK, I can add it if you all want.
@lparsons @fkang-pu @hepcat72 regarding foreign-key/attribute naming conventions, do you have a strong preference going forward on how to name columns/methods such as the existing
PeakGroup.ms_run
referencing MsRun
class (i.e. added underscore),
https://github.com/Princeton-LSI-ResearchComputing/tracebase/blob/e9be74d48cc5b66eb5d112193fe177f7f18070f9/DataRepo/models.py#L216-L222
or the presumptive addition in this issue of
either PeakGroup.peakgroupset
or PeakGroup.peak_group_set
for the foreign key from PeakGroup
to PeakGroupSet
May want to add a note on the coding/naming styles going forward, if a consensus decision is made.
Generally, camelcase to underscore seems to be the more popular representation.
Good question John. I would prefer we be consistent with how other fields and relationships are named. I believe that would be using underscores (_
), so probably PeakGroup.peak_group_set
. I'm fine with updating those notes, but I'd prefer they live in the repository in git and not on nplcadmin docs (e.g. CONTRIBUTING.md
or README.md
).
Using understore is fine with me - easy to read. So far, I can retrieve the value(s) for foreign keys as long as I use the field names defined for that model class.
Just noting that to add this non-nullable foreign key to PeakGroup, I will probably need to revert to an earlier version of the database/model, and make migrations from there...
$``` python manage.py migrate DataRepo 0003 Operations to perform: Target specific migration: 0003_auto_20210402_1105, from DataRepo Running migrations: Rendering model states... DONE Unapplying DataRepo.0011_peakdata_help_test... OK Unapplying DataRepo.0010_merge_20210504_1613... OK Unapplying DataRepo.0009_auto_20210429_1433... OK Unapplying DataRepo.0008_changed_unique_constraints_coding... OK Unapplying DataRepo.0007_peakdata_validator... OK Unapplying DataRepo.0006_msrun_researcher... OK Unapplying DataRepo.0006_auto_20210503_1855... OK Unapplying DataRepo.0005_auto_20210421_1054... OK Unapplying DataRepo.0004_PeakGroup_Data... OK
Had to go back even farther.
$ python manage.py makemigrations --name PeakGroupSet DataRepo
Migrations for 'DataRepo':
DataRepo/migrations/0002_PeakGroupSet.py
- Create model Animal
- Create model MSRun
- Create model PeakGroupSet
- Create model Protocol
- Create model Study
- Create model Tissue
- Alter field hmdb_id on compound
- Create model Sample
- Create model PeakGroup
- Create model PeakData
- Add field protocol to msrun
- Add field sample to msrun
- Add field studies to animal
- Add field tracer_compound to animal
- Create constraint unique_peakgroup on model peakgroup
- Create constraint unique_peakdata on model peakdata
- Create constraint unique_msrun on model msrun
Adding any non-nullable foreign key can be an issue, migration wise. Not certain of the best approach, and this may certainly come up again with Animal.treatment
Any reason not to regenerate the migrations completely at this point? Going forward, if we have to do this with real data, we there needs to be a method to set the field (either default or by lookup).
We could regenerate completely, I suppose. The issue with these non-nullable foreign key additions is that you might first have to create the model (PeakGroupSet
, migration no. 1 ). Then insert the initial default value (say filename=NA
, possibly with migration no. 1). Then add the default to the dependent model (PeakGroup
with that id=1 default, migration no. 2). At least, I think that is how you might have to do it, and that would see like a custom edited migration file, I think (because of the insert, and stepwise additions). I am just reverting back (above), because it doesn't really matter now, but if the database every had values we wanted to keep, we would have to figure out the appropriate protocol to pull this off.
Good point, I do think they would require some custom migrations in order to be applied cleanly on top of existing records.
I ended up in kind of a mess. Had to drop the database and re-init, here. Wasn't what I intended, but that is what happened...
Very good issue description and issue owner documentation! Clear, concise, and comprehensive.
FEATURE DESCRIPTION
Feature Inspiration
We need a way to track the provenance of imported PeakGroup data, both for error checking and later lookup. Also, there is potentially some value in the grouping of PeakGroup data provided by a single AccuCor file (data analysis session). For example, if a researcher performs a second analysis of the same MSRun data, adding new compounds to the data, we currently have no way to tell that some PeakGroups were added separately from the original PeakGroup import.
Feature Description
Add a PeakGroupSet class that tracks the filename and import time of a given AccuCor file, and associate the PeakGroups with that PeakGroupSet record.
Alternatives Considered
We had discussed tracking simply the filename, though tracking the time of import seems simple enough to add.
We also discussed tracking the researcher that uploaded the data, but since we have not incorporated the user model yet and are not providing uploads via the web interface, this will be implemented later if needed.
Comment
ISSUE OWNER SECTION
Assumptions
--accucor-file
Requirements
--accucor-file
is already a required argumentLimitations
Affected/Changed Components
--accucor-file
, and passing it alongDESIGN
GUI Change description
Input APIs and usage remain unchanged. Views and queries would presumably adjust to use this data.
Code Change Description (Pseudocode optional)
use os.path.basename
Tests
A test should be planned for each requirement (above), where possible.