Closed ypriverol closed 2 years ago
When converting the results of DIANN to mzTab, some columns are missing in three levels.
protein_search_engine_score[1]
peptide_search_engine_score[1]
psm_search_engine_score[1]
software[1]
RPH
protein_coverage
PEH
retention_time_window
spectra_ref
opt_global_feature_id
PSH
calc_mass_to_charge
pre
post
start
end
spectra_ref
opt_global_spectrum_reference
opt_global_feature_id
opt_global_map_index
When converting the results of DIANN to mzTab, some columns are missing in three levels.
- MTD
protein_search_engine_score[1]
peptide_search_engine_score[1]
psm_search_engine_score[1]
software[1]
This is related with the following issue https://github.com/vdemichev/DiaNN/issues/362 We first need to decided which one will be the best scores from DIANN, add them in PSI-MS and the use them in mzTab export.
- RPH
protein_coverage
For now, this value can be null
. @timosachsenberg is null allowed here. ? Another thing you can do, @WangHong007 is to get as input the protein database and compute the protein coverage by using the peptides identified and the protein sequence.
- PEH
retention_time_window
@timosachsenberg how do you pick this number in proteomicsLFQ. ?
spectra_ref
opt_global_feature_id
This is not needed @WangHong007.
- PSH
calc_mass_to_charge
pre
post
start
end
Again, all of them can be null
spectra_ref
Spectra reference is the combination of the file index of the mzML in the mzTab and the scan reference for the spectrum in the mzML. For example, in one of the label-free experiments:
ms_run[8]:controllerType=0 controllerNumber=1 scan=17
The first part ms_run[8]
correspond to the file in the metadata that has the corresponding spectrum. You can have multiple mzMLs, then the index [8] means that file 8 contains the spectrum that was used to identify the PSM.
The second part controllerType=0 controllerNumber=1 scan=17
correspond to the id of the spectrum used for the identification in the mzML. I guess DIA-NN keep also track of the scan corresponding to the peptide. Probably @vdemichev can help us to know which field is that in the ouput.
opt_global_spectrum_reference
This one is ony the second part of the id as decribed before.
opt_global_feature_id
opt_global_map_index
These two are not needed.
I think null is fine here
Remaining issues
MTD
Wait for the following data to be added in OLS, here vdemichev/DiaNN#362
protein_search_engine_score[1]
peptide_search_engine_score[1]
psm_search_engine_score[1]
software[1]
PEH
This is related with the following issue vdemichev/DiaNN#350
retention_time_window
spectra_ref
PSH
spectra_ref
opt_global_spectrum_reference
DIA-NN stores the scan numbers for each precursor, and these are separate for MS2 (note that for both MS1 and MS2 counting scans in DIA-NN is separate and starts with 0). The respective output column is MS2.Scan.
Thanks for your quick response @vdemichev :
Do you mean, that the MS2.Scan is basically an index system corresponding to the order of the scan in the RAW/mzML file?
Yes, but it indexes MS2 scans only. If there's scan, say, 1000, then an MS1 scan, and then another MS2 scan, then this other MS2 scan will have index 1001, not 1002.
So DIANN counts the scanned precursors and fragments, MS2.Scan
does not refer to scan
or index
in the mzML file? eg. MS2.scan=73091
in main report doesn't refer to <spectrum id="controllerType=0 controllerNumber=1 scan=73091" index="73090" defaultArrayLength="441">
in mzML file.
Yes, it does not refer to the mzML scan number. Is it a significant problem here? If there's access to the mzML, can just map one ID into another ID by counting specifically MS2 scans?
Vadim
For the release of 1.1 we will be exporting only the protein and peptide sections as agreed with @WangHong007. Then, when DIA-NN exports the original scan in the mzML we will export the PSM table.
Description of feature
DIA-NN results are not only exported to MSstats, we need to be able to export the results to mzTab.