Open hepcat72 opened 3 months ago
Here's a relevant vignette from testing phase 1 of submission process that might highlight a way for users to update compound synonyms:
I tried uploading data containing compound names that do not exist as synonyms in tracebase (but the primary compound does exist). The validate page caught this. I then checked the Compounds tab in the study doc and found these problematic compounds: they were missing an HMDB ID and the synonyms field was empty:
I decided to modify the Columns page in my Study Doc: for each of these new synonyms, I added the correct HMDB ID:
From the user's perspective this is an easy way to identify the compound unambiguously. But I can see that this would create problems for the database (unless something is created that specifically handles this). The validator found these and threw an error:
PASSED: Samples Check PASSED: Peak Annotation Files Check PASSED: Tissues Check PASSED: Treatments Check PASSED: Compounds Check FAILED: col013b_study doc_240812.xlsx
ConflictingValueErrors
Conflicting values encountered during loading:
During the processing of sheet [Compounds] in col013b_study doc_240812.xlsx...
Creation of the following Compound record(s) encountered conflicts:
File record: {'name': 'Glucose 6-phosphate', 'formula': 'C6H13O9P', 'hmdb_id': 'HMDB0001401'} (on row(s): 3)
Database record: {'name': 'glucose-6-phosphate', 'formula': 'C6H13O9P', 'hmdb_id': 'HMDB0001401'}
[name] values differ:
- database: [glucose-6-phosphate]
- file: [Glucose 6-phosphate]
File record: {'name': 'L-Lactic acid', 'formula': 'C3H6O3', 'hmdb_id': 'HMDB0000190'} (on row(s): 4)
Database record: {'name': 'lactate', 'formula': 'C3H6O3', 'hmdb_id': 'HMDB0000190'}
[name] values differ:
- database: [lactate]
- file: [L-Lactic acid]
BTW, I think your comment should probably be in a different issue, but ignoring that for now...
@mneinast - So I think I can accurately guess what's going on here. Here's what I expect happened.
I think that the solution here is to auto-populate all existing compounds and hope that the user will see that the autofilled row for what is essentially a compound synonym of another row and delete that row and add the synonym to the other record.
I'll create an issue for this.
Yes i agree that's what happened. Your proposed solution sounds good to me.
FEATURE REQUEST
Inspiration
The new submission interface allows researchers (if they wish) to add compounds, tissues, treatments, LC Protocols, tracers, and infusates. Some or all of these require manual curation by the curators who are loading the data. These datums should be extracted from a submission and a consolidated doc should be updated. Curators should be alerted to this new data and scrutinize is. They should reach out to the researcher who submitted them if there are any problems, to coordinate resolutions.
Description
Data that is common to and shared across all studies should be monitored for changes and manually scrutinized by curators. We should create a script that:
Things to note:
Model.objects.get()
call with all available fields suppliedAn accompanying doc describing the load process from a curator perspective should accompany this effort.
Alternatives
None
Dependencies
None
Comment
None
ISSUE OWNER SECTION
Assumptions
None
Limitations
None
Affected Components
study_loader.py
load_study.py
curate_study.py
study_curater.py
load_curated.py
load_table.py
table_loader.py
Requirements
1.
A study doc is output that contains only consolidated data2.
A study doc is output containing only study-specific data3.
New consolidated data is highlighted in the consolidated data study doc4.
Erroneous or warning data is highlighted in the consolidated study doc5.
The consolidated doc contains ALL consolidated data (i.e. if a user removed rows, they are added back in)6.
When splitting consolidated from study-specific data, nothing is loaded, despite no errors or warningsDESIGN
Interface Change description
curate_study.py
new scriptPurpose
The intent of this script is to split the data into study-specific data and study-common/consolidated data, and in doing so, it highlights new/changed data to prompt the curator to scrutinize that data even if it has no errors.
Inputs
--infile study.xlsx
(excel file): A submitted study--outfile-study study-specific-data.xlsx
(excel file) [<infile name>-study-specific-data.xlsx
]: Where the study-specific data goes (i.e. no consolidated data is in this file)--outfile-consolidated consolidated.xlsx
(excel file) [consolidated-data-<timestamp>.xlsx
]: Where the cross-study common data goes (i.e. all the consolidated data is in this file - no study-specific data)Outputs
load_curated.py
new scriptPurpose
This script will only load consolidated data from a
consolidated-data-<timestamp>.xlsx
file produced bycurate_study.py
. If the file contains study-specific data, it will be ignored.Inputs
--infile consolidated-data-<timestamp>.xlsx
(excel): A file produced bycurate_study.py
Outputs
None (loads the database)
load_study.py
changed scriptPurpose
This script will now only load study-specific data from a
<infile name>-study-specific-data.xlsx
file produced by eithercurate_study.py
or by the build-a-submission interface. If the file contains consolidated ("study-common") data, it will be ignored.Note that it uses
study_loader.py
, which can process all data invalidate
mode. It just doesn't set the validate flag. That is supplied by theDataValidationView
when it runs theStudyLoader
.Inputs
--infile <infile name>-study-specific-data.xlsx
(excel): A file containing study-specific data (e.g. samples)Outputs
None (loads the database)
Code Change Description
study_loader.py
StudyLoader
class will have a new class attribute that identifies loaders as either curated(/consolidated) or not (default: not curated/consolidated)StudyLoader
constructor will take new keyword arguments:curate
, which causes it to pass thecurate
option to specific loaders, e.g.:CompoundsLoader
load_curated
(defaulted toFalse
), which causes to to ONLY call loaders that are identified as curated/consolidated (default will be to only load study-specific data)load_all
option to load both curated/consolidated and study-specific data, but that's optional in this design.table_loader.py
TableLoader
constructor will take a new keyword argument:curate
, which causes it tocreated
.CurationStatus
exception (despite success/failure) that always triggers a rollback in theload_data
wrapperstudy_curater.py
Any common code/methods with
DataValidationView
may be pulled out into a separate class for re-use by this class.DataValidationView
, in that it willload_study.py
curate
andvalidate
are bothFalse
(theStudyLoader
default), it will never load the consolidated data.--all
option to load both curated/consolidated and study-specific data, but that's optional in this design.curate_study.py
load_study.py
, except it will set thecurate
option to the derivedStudyLoader
constructor toTrue
.load_curated.py
load_study.py
, except it will set thecurate
option to the derivedStudyLoader
constructor toTrue
.load_table.py
curate
andvalidate
are bothFalse
(theStudyLoader
default), it will never load the consolidated data.Tests
Unit test every new class/method