Support for timsTOF data

daichengxin commented 2 days ago

Description of feature

I tested identification workflow for timsTOF dataset in last week. The first step is to execute the tdf2mzml module and then run the same analyses as for the other data (Comet and MSGF+). Then I compared results with MaxQuant. Set PSM FDR as 0.01, and results of MaxQuant are from evidence.txt. Then overlap of identified peptides above 90% from MaxQuant. But the overlap of PSM is 0. Because scan number (index) is different bwteen quantms and MQ after comparing precursor mz.

identified_pept compare_results.csv compare_results_pep.zip

Some questions:

For example, the peptide is identified in three scans those are different MQ. How to compare and check the difference due to different scan numbers (or index)? I manually checked the identifications and they all look like they match well？

sequence	exp_mass_to_charge	quantms scan_number	MaxQuant MS/MS scan number
AAAAAAMAEQESAR	695.3256725319	356222	65719
AAAAAAMAEQESAR	695.3256725319	356647	65719
AAAAAAMAEQESAR	695.3256725319	356088	65719

Surprised quantms identified so many peptides! I also manually checked the identifications in only quantms and they all look like they match not bad？ Further assessment is needed here

ypriverol commented 2 days ago

ping @wfondrie @jspaezp

jpfeuffer commented 2 days ago

Regarding scan number mismatch: do we not have a scan ID that we could use? Did you run MQ on the tdf or on the converted mzML?

daichengxin commented 2 days ago

Run MQ on the tdf. So there are difference. But I didn't know how map scan number in MQ between quantms This is converted mzml in quantms.

jspaezp commented 1 day ago

Well I cannot say anything about how MQ deals with the numbers ... BUT ... I think it is very normal for a "scan" to mean very different things in different software dealing with PASEF data. The main reason is that a real scan in the .d has very little an noisy information. Most of the real information comes from the series of scans that encompass a single elution of the tims funnel (called a frame).

So when converting frames -> scans the naive approach of just splitting each scan is kind of useless (because it would lead to blocks of ~700 ms1 scans that dont share information with each other but share retention time, followed by a bunch of blocks of ms2 scans that look horrendous).

The more standard approach is to use sections of the frame and squash them into a single new scan. So when tdf2mzml says DEPENDING ON THE OPTIONS USED FOR EXPORTING "scan=1" could actually mean "frame 1, scans 200-250" (similar to how some qtofs have micro-scans that get aggregated) OR actually "scan 234234" in the run.

so .... I have no idea :P I would need to explore a bit more what the numbers are. Some things I would like to know:

Are the index numbers contiguous? (in mq/qms, do all scans from 1-N exist? or are there steps like 1,53,134,...N)
What settings were used in tdf2mzml?
What were your acquisition parameters?
How many scans do you have between every ms1 scan in the derived mzml?
How does the IMS section look in the .mzml ?

bigbio / quantms

Support for timsTOF data #440

Description of feature