Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

Generate consolidated table #22

Closed lparsons closed 3 years ago

lparsons commented 3 years ago

Generate a "consolidated table" for the peaks in a given Experiment. Many of these values should be part of the models (see issues #41, #42, #43). An example table provided by the lab is here: tissueDataProcessed033120_withSoleus. The format of the table should be as follows:

Column Source Description
Compound Compound.name compound name
C_Label Measurement.labeled_atom, Measurement.labeled_count specific labeled form of this compound. 0 = unlabeled. 1=one extra 13C
Sample Sample.name unique sample identification
fraction Measurement.normalized_abundance data value from AccuCor: the fraction of this compound that has this label in this sample
tissue Tissue.name descriptive type of sample source
mouse Animal.name the mouse ID (for this experiment). This might need to become a unique identifier, but here it is not unique.
ion count Measurement.corrected_abundance the ion count (for this labeled form of this compound)
seruminfusate Measurement.serum_infusate_abundance * serum_infusate_abundance - the 'normalized_abundance' for the infusate compound (tracer) in the serum sample from this mouse. If 13C-lactate was infused this is the value for 'normalized_abundance' of serum lactate.
normfraction Measurment.normalized_fraction normalized_fraction - normalized_abundance / serum_infusate_abundance, the lab usually calls this 'normalized labeling'
infusate Animal.infusate the specific metabolite that was infused in this mouse and how it is labeled. For example, "u13c-lactate" is a lactate with universal labeling of every carbon with 13C, also called 'tracer'
study Study.name identifier for the "study", which is typically a collection of mice similar infusion parameters
state Animal.state feeding status: have the mice been fasted or fed?
infusionrate Animal.infustion_rate volume of infusate solution infused (ul) per minute, per gram of mouse body weight
infusateconc Animal.infusate_concentration concentration of infusate in this infusion solution
jcmatese commented 3 years ago

I think 'experiment' is 'study', in terms of current model mapping.

hepcat72 commented 3 years ago

I have some questions and concerns related to this item. Let me start with questions related to this implementation and then I will address my over-arching concern.

First, let me state my assumptions. Correct me if any of these are wrong:

  1. This will be a page/display on the tracebase website (as opposed to some sort of cached database table)
  2. There is not yet a determination on where on the site a link to this table page will be presented
  3. The input to this page is an experiment ID
  4. The table above is a description of a 14-column table to be created as opposed to an example of a 3-column table where you describe columns and locations of interest in the database. (Call me too literal, but it did take me a moment to settle on what was desired.)

If the above assumptions are correct:

  1. Broadly speaking, what is the intended purpose of this page? (I think each change/feature request should have a defined purpose.)
  2. Should the 'study' column instead be the table title (since there will only be 1 value displayed on every row)?
  3. Should any of the values be grouped via average, max, min, range, etc.? Or will all combinations have their own rows - like a left-join?

The following are my broad-picture concerns.

I'm somewhat concerned here about code redundancy and complexity. My overall design idea, based on the project docs thus far, is to use a fully-featured search interface to combine and filter data that could be utilized to generate pages such as the one described here (which includes joins). The output would be a table with specifically selected fields to display. And as I had described via slack, the search output could be used to generate tables/columns of data that could be input into analyses, visualizations, or as in this case, a page displaying data associated with a particular study.

That's not to say we couldn't have a separate script to generate a page like this. It's just that it could use the search functionality to collect the data for display, and fill in any other metadata to customize the page. What's more is, functionality that could be utilized in a search results page to allow users to exclude outliers or tweak the contents for sending the data into an analysis or visualization could also be employed here. In the future, it could include selecting rows, rearranging rows/columns... or any other search results interface improvements we later decide to add. But if we develop table/data pages independently, such improvements would require custom edits for every page we develop.

My point is that I suggest we develop the search and search results pages first, and that we re-use those internal mechanisms when developing pages like the one described in this issue.

hepcat72 commented 3 years ago

I threw together a data flow design to conceptualize how I envision a search "method" (separate from web pages) would interact with pages in the web interface. It would execute searches, collect & organize data gathered (via joins), and spit back the data as a json object. It might take some doing to get it to work quickly, but I have done similar stuff with perl & mysql by specifying searchable fields from various tables, limiting joins, indexing database fields, and implementing various caching methods. I think it is viable functionality that will make Tracebase very versatile and the code much simpler. The JSON that's passed can have the search specifics, sorting options, filtering options, result ranges, fields to display/include, the destination page that the data is going to, whether the user wants the option to confirm/filter the data, etc.

search_lib_method

hepcat72 commented 3 years ago

Merged