Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

Double check compound naming and HMDB for Sedoheptulose 7-phosphate #847

Open jcmatese opened 9 months ago

jcmatese commented 9 months ago

Because name and HMDB ID differs between consolidated list https://github.com/Princeton-LSI-ResearchComputing/tracebase/blob/4bb4583c9898868cdb5800b9d0a2dd6bf3228339/DataRepo/data/examples/compounds/consolidated_tracebase_compound_list.tsv#L51

and the Rabinowitz data repo

https://github.com/PrincetonUniversity/tracebase-rabinowitz-data/blob/9d44914f73de1491f2cd44dcf75f115f3ce45762/compounds/compounds.tsv#L164

Also, regarding differences, there are 52 lines in the former and 184 lines in the latter, so I think the former ("consolidated") might just be for example/test data, and is not comprehensive (consolidated, but perhaps in a different context?).

hepcat72 commented 9 months ago

Sorry, it appears that it's the tracebase example data that should be updated. I'll re-create this issue there. However, that said, we should probably document somewhere that for actual production data loads, we should be using the consolidated compounds file in the Rabinowitz repo, not from the example data? Unless I'm misunderstanding or not precisely aware of the plan for housing this "basal" data? @lparsons - care to weigh in?

jcmatese commented 9 months ago

Yes, sorry this was "discovered" because I was using the tracebase CONTRIBUTING doc to cover for the out-of-date tracebase-rabinowitz-data docs/wiki. Not a big deal, just though I would report it.

lparsons commented 9 months ago

Example data in the tracebase repository can be whatever we like. It should be one (or a few) studies that new users and new developers can use to test their installation and use for development, etc. Testing data in tracebase can be various types of broken, edge cases, etc. that is used only for testing.

Data for our production system should be entirely separate, and is currently housed in the tracebase-rabinowitz-data repository (except mzXML files). For that data, all compounds, tissues, instruments, lc_methods, etc. should be kept separate from each individual study and loaded first. See https://github.com/PrincetonUniversity/tracebase-rabinowitz-data/issues/92.

Hopefully this helps clear things up, but let me know if I can help clarify anything else.

hepcat72 commented 9 months ago

Mea culpa. John had noted that the instructions in the admin docs was stale and I pointed him to the CONTRIB doc.

hepcat72 commented 2 months ago

@lparsons - I think this issue can be closed, but I'd like to see what you think. This issue stems from loading example data during the work on loading production data, so there were bound to be consistency issues. If anything, this could be supplanted by a documentation issue (if there's still a problem - I haven't checked).

lparsons commented 2 months ago

While the example data and production data are separate, it seems reasonable that we would use the same HMDB ID for the same named compound in both, at least for consistency.

@mneinast Can you help us determine which of the following records is "preferred"? The main thing to consider is whether HMDB0258206 or HMDB0001068 is the preferable HMDB record for this compound.

Production data: sedoheptulose 7-phosphate C7H15O10P HMDB0258206

Example data: https://github.com/Princeton-LSI-ResearchComputing/tracebase/blob/4bb4583c9898868cdb5800b9d0a2dd6bf3228339/DataRepo/data/examples/compounds/consolidated_tracebase_compound_list.tsv#L51

hepcat72 commented 2 months ago

It looks like the name may have originated from study obob_fasted_ace_glycerol_3hb_citrate_eaa_fa by Xianfeng Zeng in 2022.

It was transferred to the consolidated list here on Feb 10th of 2023:

https://github.com/PrincetonUniversity/tracebase-rabinowitz-data/blame/5a251b11b1a3ba3db818c446dcb734d298318bd5/compounds/compounds.tsv#L614

one later study had the dash version changed later on to match the pre-existing study, as documented here:

https://github.com/PrincetonUniversity/tracebase-rabinowitz-data/blob/5a251b11b1a3ba3db818c446dcb734d298318bd5/obob_fasted_ace_glycerol_3hb_citrate_eaa_fa/CHANGES.md?plain=1#L21-L22