bigbio / quantms

Quantitative mass spectrometry workflow. Currently supports proteomics experiments with complex experimental designs for DDA-LFQ, DDA-Isobaric and DIA-LFQ quantification.
https://quantms.org
MIT License
34 stars 35 forks source link

mzTab for DIA-NN #119

Closed ypriverol closed 2 years ago

ypriverol commented 2 years ago

Description of feature

DIA-NN results are not only exported to MSstats, we need to be able to export the results to mzTab.

WangHong007 commented 2 years ago

When converting the results of DIANN to mzTab, some columns are missing in three levels.

  1. MTD
  1. RPH

    • protein_coverage
  2. PEH

    • retention_time_window
    • spectra_ref
    • opt_global_feature_id
  3. PSH

    • calc_mass_to_charge
    • pre
    • post
    • start
    • end
    • spectra_ref
    • opt_global_spectrum_reference
    • opt_global_feature_id
    • opt_global_map_index
ypriverol commented 2 years ago

When converting the results of DIANN to mzTab, some columns are missing in three levels.

  1. MTD
  • protein_search_engine_score[1]
  • peptide_search_engine_score[1]
  • psm_search_engine_score[1]
  • software[1]

This is related with the following issue https://github.com/vdemichev/DiaNN/issues/362 We first need to decided which one will be the best scores from DIANN, add them in PSI-MS and the use them in mzTab export.

  1. RPH
  • protein_coverage

For now, this value can be null. @timosachsenberg is null allowed here. ? Another thing you can do, @WangHong007 is to get as input the protein database and compute the protein coverage by using the peptides identified and the protein sequence.

  1. PEH
  • retention_time_window

@timosachsenberg how do you pick this number in proteomicsLFQ. ?

  • spectra_ref

  • opt_global_feature_id

This is not needed @WangHong007.

  1. PSH
  • calc_mass_to_charge

  • pre

  • post

  • start

  • end

Again, all of them can be null

  • spectra_ref

Spectra reference is the combination of the file index of the mzML in the mzTab and the scan reference for the spectrum in the mzML. For example, in one of the label-free experiments:

ms_run[8]:controllerType=0 controllerNumber=1 scan=17

The first part ms_run[8] correspond to the file in the metadata that has the corresponding spectrum. You can have multiple mzMLs, then the index [8] means that file 8 contains the spectrum that was used to identify the PSM.

The second part controllerType=0 controllerNumber=1 scan=17 correspond to the id of the spectrum used for the identification in the mzML. I guess DIA-NN keep also track of the scan corresponding to the peptide. Probably @vdemichev can help us to know which field is that in the ouput.

  • opt_global_spectrum_reference

This one is ony the second part of the id as decribed before.

  • opt_global_feature_id
  • opt_global_map_index

These two are not needed.

timosachsenberg commented 2 years ago

I think null is fine here

WangHong007 commented 2 years ago

Remaining issues

  1. MTD Wait for the following data to be added in OLS, here vdemichev/DiaNN#362 protein_search_engine_score[1] peptide_search_engine_score[1] psm_search_engine_score[1] software[1]

  2. PEH This is related with the following issue vdemichev/DiaNN#350 retention_time_window spectra_ref

  3. PSH spectra_ref opt_global_spectrum_reference

vdemichev commented 2 years ago

DIA-NN stores the scan numbers for each precursor, and these are separate for MS2 (note that for both MS1 and MS2 counting scans in DIA-NN is separate and starts with 0). The respective output column is MS2.Scan.

ypriverol commented 2 years ago

Thanks for your quick response @vdemichev :

Do you mean, that the MS2.Scan is basically an index system corresponding to the order of the scan in the RAW/mzML file?

vdemichev commented 2 years ago

Yes, but it indexes MS2 scans only. If there's scan, say, 1000, then an MS1 scan, and then another MS2 scan, then this other MS2 scan will have index 1001, not 1002.

WangHong007 commented 2 years ago

So DIANN counts the scanned precursors and fragments, MS2.Scan does not refer to scan or index in the mzML file? eg. MS2.scan=73091 in main report doesn't refer to <spectrum id="controllerType=0 controllerNumber=1 scan=73091" index="73090" defaultArrayLength="441"> in mzML file.

vdemichev commented 2 years ago

Yes, it does not refer to the mzML scan number. Is it a significant problem here? If there's access to the mzML, can just map one ID into another ID by counting specifically MS2 scans?

Vadim

ypriverol commented 2 years ago

For the release of 1.1 we will be exporting only the protein and peptide sections as agreed with @WangHong007. Then, when DIA-NN exports the original scan in the mzML we will export the PSM table.