Nesvilab / FragPipe

A cross-platform Graphical User Interface (GUI) for running MSFragger and Philosopher - powered pipeline for comprehensive analysis of shotgun proteomics data
http://fragpipe.nesvilab.org
Other
175 stars 37 forks source link

Converting .d to mzML format breaks IonQuant #901

Closed MNTsnowman closed 1 year ago

MNTsnowman commented 1 year ago

I’m attempting to search a dataset for a student where we are literally searching up against the entire bacterial kingdom and the human proteome (only the ones from SwissProt though). However, every time it stops somewhere in IonQuant (see attached log file) do you have any suggestions for what to do about it?

log_2022-11-29_15-17-46.txt

fcyu commented 1 year ago

Can you share D:\Data\Nora\P001\03_ConvertedData\20221124_Nora_highBMI_mo1_10_Slot2-20_1_5317.mzML with us?

Thanks,

Fengchao

MNTsnowman commented 1 year ago

Hi Fengchao

You are fast, thanks. :)

If you can get me an e-mail, then I can share a link with you. :)

fcyu commented 1 year ago

Thanks. It's yufe AT umich.edu

Best,

Fengchao

fcyu commented 1 year ago

Hi @MNTsnowman ,

Your data has been well received.

It looks like it is converted from ddaPASEF. May I ask why did you convert .d to mzML?

For the LFQ-MBR workflow, FragPipe can analyze .d smoothly and give a good result. We also have a pre-release that support PTM localization using PTMProphet (although your log shows that you did not enable PTMProphet). Furthermore, after processing .d the first time, MSFragger generates mzBIN files to make the re-analysis much faster. On the other hand, after converting to mzML with scan summing filter, there will be no ion mobility dimension in MS1, which makes the data no longer suitable for MS1 based quantification.

Can you try the original .d ?

Best,

Fengchao

MNTsnowman commented 1 year ago

Hi Fengchao

Thanks a lot for the reply, I'll try to run the search on the .d files and get back to you.

The reason i'm not doing that to begin with is that, as fare as i have understood, IonQuant does not (yet) support PTM localization on .d files. But yes in this particular case that might not matter as we are not looking for PTMs (exept for C+57Da?), which is normally something we include in most searches, therefore as a habbit we simply convert the data to mxML. However for this particular projet, you are correct, the goal was to search the data against this very huge database to get an idea of what sticks. Later though we might include some PTMs, maybe. Normally though, IonQuant does work on this type of files.

You mentioned a pre-release for IonQuant that supports PTMs on .d files? When is that going to be released and would that support searches for glycosylations as well? As that is one of my upcomming projets, i would be very interested in a working solution, you can mail me for details if you don't want it here.

I'll get back to you with the results from the searh of the .d files.

Best Martin

fcyu commented 1 year ago

Hi Martin,

The new version will be released by the end of this week if everything goes well. It will support PTM localization for .d and .raw file formats.

For glycosylations, we also added O-Pair module to localize the O-glycans. But we haven't tested the O-Pair module with .d because we don't have such data. We are happy to see how the new version performs on the .d data. Looking forward to your feedback!

Best,

Fengchao

MNTsnowman commented 1 year ago

Hi Fengchao

It worked, in the 5th attempt, thanks. Indeed it does work with .d files, thanks a lot.

I'll be looking forward for the new version of IonQuant then, I might give it a spin with some PTMs on some .d files, and at some point down the road probably also some glycans. If anything strange happens, I'll get back to you. Thanks. :)

There is one thing i noted this time around, though, IonQuant is suddenly very slow, half of the entire search time (almost 6 houers) was spend on IonQuant alone. Through this entire period, it was only utilizing 1 thread of the 63 allowed. I remember IonQuant as being faster, but I'm not sure at this point. Would it be possible to optimize this? That could improove overall searchtime, a lot.

Anyways, thanks a lot for your help, i'll get back to you with the performans of the next version of IonQuant on .d files with PTMs, glycans and other oddities.. :)

Best Martin

fcyu commented 1 year ago

Hi Martin,

Thank you very much for your feedback.

There is one thing i noted this time around, though, IonQuant is suddenly very slow, half of the entire search time (almost 6 houers) was spend on IonQuant alone. Through this entire period, it was only utilizing 1 thread of the 63 allowed. I remember IonQuant as being faster, but I'm not sure at this point. Would it be possible to optimize this? That could improove overall searchtime, a lot.

Can you send me both log files? Are you comparing the run time of .d vs .mzML? If so, then the slowing down is due to the different data types. But still, please send me the log files.

Best,

Fengchao

MNTsnowman commented 1 year ago

Hi Fengchao

You can use the first log file, that proceeded through the steps that takes a long time. Also that was on mzML files. attached you can find the newest log file which succeded, this was on .d files. You can then compare the time stamps for the IonQuant part. I guess by now you see my point. :) Ans yes, during that slow part, only 1 thread has been used.

Best Martin

log_2022-11-30_20-42-50.txt

fcyu commented 1 year ago

Hi Martin,

So the difference is from the different data format (.d vs .mzML). Loading .d is much slower than .mzML, and is single-thread. So, I guess it is expected. I think we will optimize it by loading multiple .d in parallel in the future.

Best,

Fengchao

fcyu commented 1 year ago

BTW, this is your first time analyzing the .d using FragPipe, which data loading took a very long time. When you re-analyze it, it would be much faster because MSFragger would use the mzBIN files and skip the data loading part.

Best,

Fengchao

MNTsnowman commented 1 year ago

Hi Fengchao

Sorry, I might have messed up the names here. It might be "PhilosopherFilter" that takes the time and not IonQuant. Again, look at the time stamps in the two logs. Here you can see that the step before IonQuant (which crashed in the first log) took between 5-20 minutes per file. This is what i mean. Sorry for the confusion.

I'm unsure if that is you or not, but if it is a possibility to run multiply samples here in parallel, that would be great.

I know that MSFragger normally generates mzBIN files of the .d files the first time, but i don't think that these are involved in this process?

Also, from the two logs you can see that this "PhilosopherFilter" step takes around the same time with .mzML and .d files, roughly.

Sorry for the confusion.

Best Martin

fcyu commented 1 year ago

I'm unsure if that is you or not, but if it is a possibility to run multiply samples here in parallel, that would be great.

Yes, this is the same as https://github.com/Nesvilab/FragPipe/issues/651

I know that MSFragger normally generates mzBIN files of the .d files the first time, but i don't think that these are involved in this process?

If you were talking about the Philosopher Filter command, it is not related to mzBIN files.

Best,

Fengchao

MNTsnowman commented 1 year ago

Hi Fengchao

Yes, it seems to be the same as for #651 as mentioned for the Philosopher Filter command.

And, yes, it was the Philosopher Filter command I meant, as you can see from the log that is very slow.

Thanks a lot. :)

Best Martin