HUPO-PSI / mzTab

mzTab Reporting MS-based Proteomics and Metabolomics Results
https://hupo-psi.github.io/mzTab
37 stars 16 forks source link

encoding DIA results in mzTab 1.0 #182

Open ypriverol opened 4 years ago

ypriverol commented 4 years ago

@andrewrobertjones :

We have some users that want to export DIA results into mzTab 1.0. The proposal is to use optional columns to highlight the information from the spectral library, including:

This will be encoded as optional columns in the format (one column for each of the values) and the values will be arrays. I have open an issue in the spectral library format to know how they are planning to encode ion annotations. This representation can be also used for spectral library search.

https://github.com/HUPO-PSI/SpectralLibraryFormat/issues/20

timosachsenberg commented 4 years ago

do you have some details and examples? e.g. for the PRT and PEP section?

ypriverol commented 4 years ago

These changes are most for the PSM section. The other two sections remains the same.

ypriverol commented 4 years ago

@bittremieux can you give your input here. I remember long time ago we had a discussion about how to encode spectral library results into mztab.

ypriverol commented 4 years ago

Another option is to add a reference to a spectral library result file that contains this information. Then you will have only one CVparam reference_to_spectrum_library and this will contain the index? or id of the spectrum in the library. We can start by accepting MSP for now but in the future, we can say needs to be mzSpecLib. What do you think @edeutsch @andrewrobertjones ?

edeutsch commented 4 years ago

I think this is a better option myself. But note that MSP doesn't really have a spectrum id. It only have a spectrum name, which might be quite long and unclear on its uniqueness.

ypriverol commented 4 years ago

We will use the file name for now as an index.

bittremieux commented 4 years ago

When was that? I don't recall the actual discussion, but in general using mzTab for spectral library results shouldn't be too hard. I'm already doing it with the ANN-SoLo output. The main thing is how to refer to the spectra in the library, so in the accession column I'm storing numeric indexes of the library spectra.

ypriverol commented 4 years ago

@bittremieux numeric index meaning the index of the spectrum in the file.?

timosachsenberg commented 4 years ago

Would it be more convenient to store the protein accession in the accession column and have another column for the reference to the spec. lib?

andrewrobertjones commented 4 years ago

I would keep everything about the PSM line the same as a sequence database search. The accession column is needed for linking to the protein table, so you can't re-use that. Just add an opt_global CV param, with a reference to the external spectral library ID, using different CV terms if there are different external format ID types to cover

bittremieux commented 4 years ago

@ypriverol Yes, the index of the spectra in the spectral library.

@timosachsenberg Often there's no protein accession information because spectral libraries are inherently spectrum-based, in contrast to FASTA files which start from the whole protein sequences.

Makes sense to not misuse this column though like Andy says. I'll have to change it in my ANN-SoLo mzTab export.