HUPO-PSI / mzTab

mzTab Reporting MS-based Proteomics and Metabolomics Results
https://hupo-psi.github.io/mzTab
39 stars 17 forks source link

MS Run Identifiers (MRIs) #220

Open YasinEl opened 2 weeks ago

YasinEl commented 2 weeks ago

Hello, and thank you for maintaining this tool!

We would like to propose the ability to reference paths to spectral raw files in the public domain using MS Run Identifiers (MRIs), as outlined in section 3.4.1 of the USI specifications.

To facilitate this, we suggest adding an "ms_run[x]-public_location" parameter to the MTD table. Since MRIs are not available until files are uploaded to a public repository, many users may not be able to provide them upfront. We propose populating these MRIs post-upload using filename matching between the "ms_run[x]-location" and repository data which would not be done by the user but upon upload of mzTAB-m files with mzML files to public repositories.

nilshoffmann commented 6 days ago
3.4.1 The MS Run Identifier
Implied within the USI is an MS run identifier. Every deposited MS run can be referenced with a
shortened form:
mzspec:<collection>:<msRun>
such as this example:
mzspec:PXD000561:Adult_Frontalcortex_bRP_Elite_85_f09
Since an MS run may be represented in several formats (with potentially slightly different data
associated with the spectrum), a format suffix MAY be specified:
mzspec:PXD000561:Adult_Frontalcortex_bRP_Elite_85_f09.RAW
to signify that the .RAW file is meant. Although each resource must recognize this extension for
what it is, how a resource must handle this extension is not prescribed, as discussed above in
subsection 3.3.4.

I would suggest keeping the suffix aligned with the current naming and use ms_run[1]-usi_identifier to indicate what type of identifier this is. Alternatively, to cater for potentially other identifiers, we could use ms_run[1]-identifier, specifying that the value is application specific. With a prefix of mzspec: this would equal a MS Run USI Identifier. This resembles the CURIE way of encoding namespaces, so would also align with other schemes in the future.

YasinEl commented 5 days ago

I agree regarding keeping the suffix and also like the possibility of defining the used identifier. In case it's relevant we typically give the entire file path in MRIs such as mzspec:MSV000086206:peak/mzml/S_N1.mzML