Add median retention time to Compounds table

mneinast commented 2 years ago

FEATURE REQUEST

Inspiration

TraceBase provides a unique opportunity to generate a Knowns list that a researcher could export and use when curating Peak Groups in Maven or El-Maven.

Description

Add Retention Time to the Compounds table. A researcher could export this table as csv and use it as a Knowns list in Maven. The value for Retention Time could be the median Retention Time for all observations of that PeakGroup in TraceBase.

Alternatives

Dependencies

This issue cannot be started until the completion of the following issue(s):

<issue number 1>
<issue number 2>

Comment

The format for a Knowns list in Maven is four columns: 1) Name - name of compound 2) Formula 3) m/z (not required) 4) RT - retention time in minutes (not required)

additional columns are ignored.

ISSUE OWNER SECTION

Assumptions

List of assumptions that the code will not explicitly address/check
E.g. We will assume input is correct (explaining why there is no validation)

Requirements

[ ] 1. List of numbered conditions to be met for the feature
[ ] 2. E.g. Every column/row must display a value, i.e. cannot be empty
[ ] 3. Numbers for reference & checkboxes for progress tracking

Limitations

The retention time of a compound in an MS run may change dramatically if a different method of liquid chromatography is used. The vast majority of data in TraceBase is currently based on a common method "hilic25 minutes", but the optimal RT for different methods could be different.

Affected Components

A tentative list of anticipated repository items that will be changed, labeled with "add", "delete", or "change". One item per line. (Mostly, this will be a list of files.)

change: File path or DB table ...
add: Environment variable or server setting
delete: External executable or cron job

DESIGN

Interface Change description

Describe changes to usage. E.g. GUI/command-line changes

Code Change Description

Describe code changes planned for the feature. (Pseudocode encouraged)

Tests

[ ] 1. A description of at least one test for each requirement above.
[ ] 2. E.g. Test for req 2 that there's an exception when display value is ''
[ ] 3. Numbers for reference & checkboxes for progress tracking

jcmatese commented 2 years ago

Re: a direct relationship between retention time and a compound, in the old PUMP database iteration, there was the concept of a "compound_detection" and "chromatography" regime that linked up a trio MS detection window and a chromatography protocol's retention time of that compound (just the chromatography linking table, here), but sure, you could just store a "current lab default RT" in the compound table, it it were useful Screen Shot 2022-10-13 at 2 28 46 PM

jcmatese commented 2 years ago

Regarding historical protocol definition and storage, we used to take an entire webform and spreadsheet, just to define LC-MS compound detection Screen Shot 2022-10-13 at 2 35 34 PM

hepcat72 commented 1 year ago

@lparsons - I was just reviewing issues and noted that this seems a bit relevant to the last mzXML-related schema changes (#664) and our recent lunch discussion with Michael. It refers to that "magic knowledge" that the researchers know when looking for unanalyzed data. I just thought I'd mention this since those schema changes are still outstanding...

hepcat72 commented 3 months ago

I was just reviewing a bunch of issues and I noted that this could be implemented as a view (with a bootstrap table export). I think that the reference to "Compounds table" in the issue might have been misinterpreted as a reference to the database table (given changes I observed in old testing data that took these values for loading), but is actually a reference to a view/template. The compounds view could use property fields (cached or using a maintained field) that computes median m/z RT.

lparsons commented 3 months ago

The initial intent of this issue was to track the retention time as determined by the researchers from running known standards on a specific type of column. This could be used to generate knowns list for future analysis or to track the stability of the retention time across various column conditions, etc. Something like this was implemented in https://metaboldb.princeton.edu/methods (see compounds. Code https://github.com/lparsons/metabolite_database

An RT would need to be tracked for a specific standard run for a compound using a given column composition (method).

It may also be useful to track the measured retention times for a given peak and use that to calculate statistics across experiments. That would be a separate endeavor, however.

Princeton-LSI-ResearchComputing / tracebase