mzML input from Bruker timsTOF data

andzajan commented 1 year ago

Hi, Fragpipe and IonQuant seems to be working very very with direct input of timsTOF .d folders.

We would like to use slightly different pre-procssing of raw data in our pipeline abd use Bruker files converted to mzML format with IonQuant.

While mzML input works very well for msfragger search and outputs are consistent of is output from .d folders, that's not a case with IonQuant.

For testing, I did try using mzML file exported from msconvert from Proteowizard, but processing only runs if MS data type is set to Regular MS and won't work with IM-MS setting. But fails on normalisation of intensities.

2023-04-20 13:38:56 [INFO] - Checking C:\mzML_test\PWIZ_200ngHeLaPASEF_2min_compressed.mzML...
2023-04-20 13:39:14 [INFO] - Loading C:\mzML_test\PWIZ_200ngHeLaPASEF_2min_compressed.mzML...
2023-04-20 13:39:40 [INFO] - Building index...
2023-04-20 13:39:41 [INFO] - Loading C:\mzML_test\msfragger_out_bruker_pwiz_mzML\PWIZ_200ngHeLaPASEF_2min_compressed\psm.tsv....
2023-04-20 13:39:41 [INFO] - Use each MS2 scan's calculated MZ in peak tracing.
2023-04-20 13:39:41 [INFO] - Quantifying...
2023-04-20 13:39:42 [INFO] - Updating Philosopher's tables...
2023-04-20 13:39:43 [INFO] - Combining experiments and estimating protein intensity...
java.lang.NullPointerException: Had an error in normalization ion intensities.
    at n.<init>(Unknown Source)
    at ionquant.IonQuant.main(Unknown Source)
2023-04-20 13:39:43 [ERROR] - Had an error in normalization ion intensities.
Process 'IonQuant' finished, exit code: 1
Process returned non-zero exit code, stopping

For our in-house processed file exported into mzML pipeline doesn't crash, but most of the proteins are not qauntified and few which are have very different values from what msfragger + IonQuant returns on raw .d input. Meanwhile our in-house export of Thermo rawfiles gives very good match across raw and mzML input in msfragger and IonQuant.

I was wondering if you have some suggestions how the mzML file for Bruker data should be formatted so that IonQuant performs correctly? Or because of they "unconvential" structure of the timsTOF data you would reccoment only to use .d input directly?

fcyu commented 1 year ago

There are some tricky things when you convert .d to mzML. Taking msconvert for example, if you enabled the combine ion mobility scans and added the scanSumming filter, the database searching would give good results (as you have also reproduced) but the MS1-based quantification would not work well because there were no ion mobility arrays in the MS1 scan. It is unclear whether the latest version of msconvert addresses this issue.

If you didn't enable those two, the database searching would not work well. Therefore, there is not any idea way to convert .d to mzML using msconvert. Furthermore, the converted mzML file is huge (~20GB for a 2 h gradient data) even without the MS1 ion mobility array. You also can't add any peak filtering in the converting if you want to perform MS1-based quantification.

As to the errors, it should be because there are no ion mobility arrays in the MS1, which results in bad/incorrect XIC. Thermo's raw file does not have this issue because there is no ion mobility dimension.

In a word, I don't think it is a good idea to convert .d to mzML if you want to perform MS1-based quantification. No matter how you optimize the content in the mzML file, the huge file size and long conversion time are unavoidable. That is why we suggest users directly use .d for our tools.

Best,

Fengchao

andzajan commented 1 year ago

Thank you Fengchao for very fast and detailed respinse, that's what I was suspecting. We indeed do specra summing based on serveral criteria and I thought it would helpful to use exteranl tools for label free qauntification.

Thanks again, Andris

fcyu commented 1 year ago

Hi Andris,

Theoretically, summing MS2 spectra has no harm to the MS1-based quantification and is good for MS2 database searching. But somehow msconvert discards the ion mobility array, which breaks the MS1-based quantification. If you can manage to sum the MS2 and not change the MS1, the mzML should be good for everything. Just it will be very large.

Best,

Fengchao

andzajan commented 1 year ago

We don't use msconvert internally, we are extracting data using Bruker SDK library in C# and do all the data mangling there. And now I made an option to export spectra to mzML as well. I only use mzml for check during development. So if I do manage to addd IM arrays along MS1 data, how these should be included in mzML file? Because I believe IonQuant needs to interpret mzML input as IM file.

Thank you for your time, Andris

fcyu commented 1 year ago

You need to have a binary array with MS:1002815 other than the intensity and mz. You can send me your files to let me take a look if you want.

Best,

Fengchao

andzajan commented 1 year ago

Hi Fengchao, thank you for clarification. So basically you are saying that we would have to include each slice of mobility bin for each "MS1" frame. That indeed would inflate mzML file.

But what if we do MS1 spectra summing using IM bins which were scanned for in MS/MS frames. Let's say if precursor was isolating on scans 620 to 650, we could also sum all MS1 spectra from this scan window for MS1 data? I believe file still would be quite huge, but perhaps a bit more managable. Anf of course by doing this we will loose IM resolution as MS:1002815 can only store single value, so that would be mean IM of the ion.

BW, Andris

fcyu commented 1 year ago

Hi Adris,

If you sum MS1 spectra, you will loose the ion mobility resolution as you also pointed out. I am not sure if it will work for the MS1-based quantification. Therefore, we don't recommend converting .d to .mzML.

Best,

Fengchao

Nesvilab / IonQuant

mzML input from Bruker timsTOF data #43