METADATA FORMAT - Githubissues

bigbio / quantms.io

The proteomics quantification format, extending mzTab for large scale datasets.

Other

5 stars 4 forks source link

METADATA FORMAT #2

Closed ypriverol closed 1 year ago

ypriverol commented 1 year ago

This is the first section of the quantms files, including the project.json and the .sdrf.tsv. The main idea of the format and use cases is discussed on issue #1

sonatype-lift[bot] commented 1 year ago

Sonatype Lift is retiring

Sonatype Lift will be retiring on Sep 12, 2023, with its analysis stopping on Aug 12, 2023. We understand that this news may come as a disappointment, and Sonatype is committed to helping you transition off it seamlessly. If you’d like to retain your data, please export your issues from the web console. We are extremely grateful and thank you for your support over the years.

📖 Read about the impacts and timeline

daichengxin commented 1 year ago

Are tolerence parameters in the comment? We may need to filter projects based on resolution in some use cases (AI training). What do you think?

timosachsenberg commented 1 year ago

Is there any order associated with these sets? Or just sorted lexicographically?

ypriverol commented 1 year ago

Are tolerence parameters in the comment? We may need to filter projects based on resolution in some use cases (AI training). What do you think?

If we add more comment attributes, then we should add them as key value pairs, including the Fragmentation types.

Something like:

`adquisition_properties`: [
              {`precursor tolerance`: `0.05 Da`}, 
              {`dissociation method`: `HCD`}        
     ]

I like to still have in to other properties as summary the instrument and enzyme. What do you think @daichengxin

ypriverol commented 1 year ago

Is there any order associated with these sets? Or just sorted lexicographically?

This is a good question, I think .json do not enforce any order?

timosachsenberg commented 1 year ago

My concern is that without the full mapping of metadata to quants (file+channel) it gets less useful for AI. Or I did not properly understand the scope of this json …

ypriverol commented 1 year ago

My concern is that without the full mapping of metadata to quants (file+channel) it gets less useful for AI. Or I did not properly understand the scope of this json …

This JSON is the project description and have served to the following use case:

Quickly load of the reanalysis description into web pages for searching and visualization.
Filter out projects by tissue, instrument, fragmentation mode etc.

We will need a different file for the mapping between file channel or that mapping should be included in every file.

daichengxin commented 1 year ago

I like to still have in to other properties as summary the instrument and enzyme. What do you think @daichengxin

I like the key value pairs. Maximum Missed Cleavages Setting: PRIDE:0000074 would be a key property for enzyme.

ypriverol commented 1 year ago

@daichengxin @timosachsenberg @jpfeuffer I updated the PR with the information about how do I think we should update the MSstats output. ☝️ DE.md and AE.md.

The major changes are that I suggest adding a valid header section that explain for our users what each column means. The header start with a # which make this version compatible with MSstats. Please feel free to add more information in the header. This will help us to load the MSstats output into a web page and visualize the DE values.

daichengxin commented 1 year ago

Does SampleID or Sample_ID look better? The current output of ibaqpy is the former. @ypriverol

ypriverol commented 1 year ago

Does SampleID or Sample_ID look better? The current output of ibaqpy is the former. @ypriverol

This has been solved.