Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

Add median retention time to Compounds table #563

Open mneinast opened 2 years ago

mneinast commented 2 years ago

FEATURE REQUEST

Inspiration

TraceBase provides a unique opportunity to generate a Knowns list that a researcher could export and use when curating Peak Groups in Maven or El-Maven.

Description

Add Retention Time to the Compounds table. A researcher could export this table as csv and use it as a Knowns list in Maven. The value for Retention Time could be the median Retention Time for all observations of that PeakGroup in TraceBase.

Alternatives

Dependencies

This issue cannot be started until the completion of the following issue(s):

Comment

The format for a Knowns list in Maven is four columns: 1) Name - name of compound 2) Formula 3) m/z (not required) 4) RT - retention time in minutes (not required)

additional columns are ignored.

image


ISSUE OWNER SECTION

Assumptions

Requirements

Limitations

The retention time of a compound in an MS run may change dramatically if a different method of liquid chromatography is used. The vast majority of data in TraceBase is currently based on a common method "hilic25 minutes", but the optimal RT for different methods could be different.

Affected Components

A tentative list of anticipated repository items that will be changed, labeled with "add", "delete", or "change". One item per line. (Mostly, this will be a list of files.)

DESIGN

Interface Change description

Describe changes to usage. E.g. GUI/command-line changes

Code Change Description

Describe code changes planned for the feature. (Pseudocode encouraged)

Tests

jcmatese commented 2 years ago

Re: a direct relationship between retention time and a compound, in the old PUMP database iteration, there was the concept of a "compound_detection" and "chromatography" regime that linked up a trio MS detection window and a chromatography protocol's retention time of that compound (just the chromatography linking table, here), but sure, you could just store a "current lab default RT" in the compound table, it it were useful Screen Shot 2022-10-13 at 2 28 46 PM

jcmatese commented 2 years ago

Regarding historical protocol definition and storage, we used to take an entire webform and spreadsheet, just to define LC-MS compound detection Screen Shot 2022-10-13 at 2 35 34 PM

hepcat72 commented 1 year ago

@lparsons - I was just reviewing issues and noted that this seems a bit relevant to the last mzXML-related schema changes (#664) and our recent lunch discussion with Michael. It refers to that "magic knowledge" that the researchers know when looking for unanalyzed data. I just thought I'd mention this since those schema changes are still outstanding...

hepcat72 commented 3 months ago

I was just reviewing a bunch of issues and I noted that this could be implemented as a view (with a bootstrap table export). I think that the reference to "Compounds table" in the issue might have been misinterpreted as a reference to the database table (given changes I observed in old testing data that took these values for loading), but is actually a reference to a view/template. The compounds view could use property fields (cached or using a maintained field) that computes median m/z RT.

lparsons commented 3 months ago

The initial intent of this issue was to track the retention time as determined by the researchers from running known standards on a specific type of column. This could be used to generate knowns list for future analysis or to track the stability of the retention time across various column conditions, etc. Something like this was implemented in https://metaboldb.princeton.edu/methods (see compounds. Code https://github.com/lparsons/metabolite_database

An RT would need to be tracked for a specific standard run for a compound using a given column composition (method).

It may also be useful to track the measured retention times for a given peak and use that to calculate statistics across experiments. That would be a separate endeavor, however.