First DIA dataset annotated.

ypriverol commented 3 years ago

Hi @bigbio/collaborators :

The following dataset https://github.com/bigbio/proteomics-metadata-standard/blob/b9b3463bef54e8dd1410d1cac3c9961e67a569af/annotated-projects/PXD003539/PXD003539.sdrf.tsv contains the annotations for the sample of the dataset PXD003539 as requested by #309 by @MeenaChoi . The sample annotations are done but the Comment section (the information about the data NOT).

Some ideas about DDA differences:

[x] The PTMs normally in DIA data is captured in the DDA library generation. Should we add them also here, in the DIA section?
[x] The precursor mass and fragment mass make little sense here, we can remove that information.

Additional information that should be captured:

For this paper we have the following Methods description:

The SWATH-MS data acquisition in a Sciex TripleTOF 5600 mass spectrometer was
performed as described before (Gillet et al., 2012), using 32 windows of 25 Da effective
isolation width (with an additional 1 Da overlap on the left side of the window) and with a
dwell time of 100 ms to cover the mass range of 400 - 1200 m/z in 3.3 s. The collision energy
for each window was set using the collision energy of a 2+ ion centered in the middle of the
window (equation: 0.0625 x m/z - 3.5) with a spread of 15 eV. The sequential precursor
isolation window setup was as follows: [400-425], [424-450], [449-475], …, [1174-1200].

How do we capture these parameters? How common they are across data analysis pipelines? Feedback needed.

@jgriss @levitsky @timosachsenberg @mwalzer

StSchulze commented 3 years ago

PTMs: all modifications that result from the way the sample is obtained/prepped (e.g. Carbamidomethyl, TMT, SILAC labels, ...) should still be captured, i.e. it's in my opinion the same as for DDA, since the format doesn't capture the search/data analysis but the sample

Mass tolerances: this is still relevant in my opinion. DIA methods, as far as I know, can still acquire MS1 spectra, so the accuracy of both MS1 and MS2 still matters. And this gets back to a discussion we had early on whether mass tolerances should be used or rather something like mass resolution. Or as Magnus Palmblad had phrased it "Resolving power and mass measurement uncertainty are properties of (metadata if you will) the MS data, the mass measurement error tolerance is metadata for the peptide IDs". So again, since the format is supposed to capture the relationship between the sample and the data files, including resolving power/mass measurement uncertainty would have been the better choice and would still be applicable to DIA the same was as to DDA.

fcyu commented 3 years ago

Hi @ypriverol ,

Attached please find the default parameters for DIA-Umpire.

Best,

Fengchao

umpire-se_default.zip

fcyu commented 3 years ago

The key parameters in DIA-Umpire are SE.MS1PPM, SE.MS2PPM, SE.NoMissedScan, and SE.EstimateBG. SE.NoMissedScan = 2 gives longer run time, but extracts a bit more precursors, so gives a bit more IDs. In small datasets, SE.NoMissedScan can be 2. When there are many files, SE.NoMissedScan can be set to 1.

tiannanguo commented 3 years ago

hi guys, thanks for your interest in this NCI60 swath data set, and thank to @ypriverol for reminding me of this effort.

1, PTMs: I agree with @StSchulze in that in terms of data acquisition of potential PTMs-peptides or other peptide variants, DIA is not different from DDA. We do not need a prior info of PTMs before we can re-analyze them.

2, precursor/frag tolerance: again this is important information for DIA, and should be included when authors report DIA data acquisition. The mass tolerance at MS1 level in DIA is less sensitive than that in DDA; however, the mass tolerance at MS2 level in DIA is as crucial as that in DDA.

In addition, information of retention time is a crucial factor which should be annotated: LC conditions including gradient length, presence of spike-in peptides (eg iRT) for RT calibration.

Tiannan

Leon-Bichmann commented 3 years ago

Hi, yes I agree same way of annotating PTMs for DDA and DIA would make sense I think.

In addition maybe the following parameters might be usefull: rt_extraction_window: eg. 10min ion_mobility_window: eg. 10msec min_upper_edge_dist: eg. 1Th (Overlap between swath windows) iRT standard: eg. Biognosys iRT kit

bigbio / proteomics-sample-metadata

First DIA dataset annotated. #522