ProteoWizard / pwiz

The ProteoWizard Library is a set of software libraries and tools for rapid development of mass spectrometry and proteomic data analysis software.
http://proteowizard.sourceforge.net/
Apache License 2.0
233 stars 100 forks source link

Issues converting Waters raw to mzML #1922

Closed jonaheaton closed 11 months ago

jonaheaton commented 2 years ago

I'm am trying to convert Waters raw data originating from a XEVO-G2XSQTOF machine to an mzML format. When I use the msconvert windows GUI app, it crashes without any error message at all, and using the command line I get the following message: "caught unknown exception" please report this error

I am using the vendor peak picking ("peakPicking true 1-") and looking in the raw data folder, I do see that a centroid.raw file is created and there are data files inside with what I suspect is the properly centroided data. However MSconvert appears to run into an error before this data is written to mzML file. command_output.txt I don't think I am the first one with this issue: Example 1: https://github.com/ProteoWizard/pwiz/issues/775 Example 2: https://sourceforge.net/p/proteowizard/mailman/message/36714564/

Both the "_HEADER.TXT" and the "_extern.inf " have three functions, although there appears to be .DAT and .IDX files for a 4th function in the raw data. Deleting those files associated with the 4th function doesn't help.

Thanks for any help you can give! Best, Jonah

chambm commented 2 years ago

Hi Jonah,

Are you using the latest pwiz version? Do you get the crash without vendor peak picking? I'll probably need to see the data to reproduce this myself and report it to Waters. Are you able to share it?

jonaheaton commented 2 years ago

Hi thank you for the super fast response! Yes I am using the latest version: The version I ran with a docker image: ProteoWizard release: 3.0.21334 (00c4b77) The version I ran with the windows GUI: ProteoWizard 3.0.21350 64-bit

It does NOT crash when I remove the Vendor peak picking. I need to get permission before sharing, but I suspect that won't be a problem. I'll get back to you when I get the "ok"

jonaheaton commented 2 years ago

I got permission to share one of the data files with you. What is the best way to share ~1.3 gb file with you?

chambm commented 2 years ago

Check my profile to send me an email and I'll send you a place to upload to.

chambm commented 2 years ago

I was able to get your file converted with peakPicking by deleting/moving Func003 instead of Func004. I don't understand why. That function's data files are very small though, so I doubt much is lost. What kind of instrument method (configuration of functions) was used to collect these data?

jonaheaton commented 2 years ago

Huh, that is weird. Thank you for looking into that! I was told that the method of acquisition is MSE, but beyond that I don't have any other details about the method and instrument. Looking at the header file, it looks like the instrument is Xevo G2-XS QTof. I'll have to ask those who performed the experiments and get back to you more details.

SivanXW commented 2 years ago

Hello, has this issue been solved yet? I met the totally same problem when I tried to convert Waters raw data to mzML format using the vendor peak picking. Thanks for your kind help!

chambm commented 2 years ago

No real fixes for this issue AFAIK. Does it crash without vendor peak picking? Are there a small func00. files you can move to a subdirectory to get it working?

SivanXW commented 1 year ago

Thanks for your response. It works well without vendor peak picking. I haven't found a func00* files as you mentioned. But it's weird that when I restarted my computer and try to covert individual files instead of batch files, it worked suddenly.

Sanchezillana commented 1 year ago

Hi! I have the similar vendor peakpicking issue converting Waters .raw DDA files. Same LC and MS methods, same everything and sometimes the files are converted and sometimes not. 6 functions (1xM1, 3xMS2, 1xMS1(lockmass), 1xUV). I use vendor peakpicking and subscan filtering 1-4 and it works only for some files. Also, if I copy the files again from the PC connected to the LC-MS to my PC, then different files worked! What is going on with Waters?? With the CWT works fine, I am considering using it...

chambm commented 1 year ago

Waters is working on an SDK update that will provide a special DDA processing mode for Waters data. Should be merged in the next week or so. I can't say for sure it'll be more stable than the current SDK but they are engaged on it. But hearing that the same data sometimes crashes and sometimes doesn't is somewhat disturbing. It suggests some undefined behavior in the SDK where it may or may not crash depending on what's in uninitialized memory.

Sanchezillana commented 1 year ago

Thank you for your answer and good news. Crossing fingers with that SDK, I am very interested in processing DDA data from Waters. If you need my files for testing or something please let me know. Any advice for working with this undefined behavior meanwhile?

chambm commented 1 year ago

@Sanchezillana OK, MSConvertGUI is updated with a Waters DDA processing filter that will enable their newly added DDA processing mode. However it seems to do everything EXCEPT peak picking. I have to do CWT peak picking, then the results are good. Still waiting on Waters to tell me if the lack of peak picking is a bug.

Sanchezillana commented 1 year ago

@Sanchezillana OK, MSConvertGUI is updated with a Waters DDA processing filter that will enable their newly added DDA processing mode. However it seems to do everything EXCEPT peak picking. I have to do CWT peak picking, then the results are good. Still waiting on Waters to tell me if the lack of peak picking is a bug.

Hi @chambm ! Thank you for your answer. I have tried now with the Waters DDA filter on the msconvert version 3.0.23039-e4357b8 from sourceforge and I have the same issue with the precursor ion in my file (among others). The only diference is that the lockmass scans are removed. My files were acquired in fastDDA mode+PDA scan (UV scan) with lockmass correction aplied during the acquisition.

I can share a file with you: https://universitatdevalencia-my.sharepoint.com/:u:/g/personal/angel_illana_uv_es/EWWIb-FBuTpPsyd8qdC_8jgB4t1PAJBC-oSre_u_kW1o7A?e=CdmPxs

I checked:

  1. conversion of raw file with CWT pickPicking filter only = strange file with the UV scans + precursor ion issue
  2. conversion of raw file with CWT pickPicking filter + Waters DDA filter = FAILS
  3. manual delete of UV functions ( _FUNC006.DAT and _FUNC006.IDX) + CWT pickPicking filter = Works with precursor ion issue and include the lockmass scans
  4. manual delete of UV functions ( _FUNC006.DAT and _FUNC006.IDX) + CWT pickPicking filter + Waters DDA filter = Works with precursor ion issue but delete the lockmass scans.

Maybe I am using the wrong version?

chambm commented 1 year ago

I can repro the DDA processing failure with the UV functions. I'll pass that along to Waters because it looks like it's in their code.

When I test your file after removing the UV function manually, I do see differences between the precursor m/z with DDA processing on vs. off. So what do you mean "precursor ion issue"? And is excluding the lockmass scans a problem? You want to keep those?

Sanchezillana commented 1 year ago

I mean that when I open the converted files (point 4 in my coment) with MzMine I noted that the precursor ion is still not the correct one. It is cool that the lockmass was deleted, 0 problem with that! imagen

chambm commented 1 year ago

I don't know why you call that incorrect. It is very clear that the 443 and 341 precursors are completely different spectra. Here are the two spectra you selected. You can see the spectral similarity is very low: image Here are the scans without DDA processing (one of them got merged and the other didn't, although it looks like it probably should have). You can see the spectral similarity for the same precursor m/z is huge: image

Unless I'm missing something the precursor m/z assignment looks correct.

Sanchezillana commented 1 year ago

I mean that the mass in the field "precursor ion" of MS2 scan is not the exact mass of the ion picked in MS1 for fragmentation. When I converted files for Thermo, for example, this doesn't happen.

elnurgar commented 1 year ago

With thermo files it doesn't happen. It happens with Waters and Bruker data. Therefore I prepared a script that can solve this problem.

elnurgar commented 1 year ago

The MS1 and MS2 scans are well calibrated, only the precursor ion value is not calibrated https://github.com/elnurgar/mzxml-precursor-corrector

chambm commented 1 year ago

OK yes I see the issue now. The DDA processing does seem to change the precursor m/z, but it doesn't seem to be fully lockmass correcting it. Is it supposed to do so @pete-reay-waters ?

chambm commented 1 year ago

With thermo files it doesn't happen. It happens with Waters and Bruker data. Therefore I prepared a script that can solve this problem.

In the 2000s it was a pretty common problem with Thermo FT/Orbi such that there are two different filters in msconvert to deal with it: precursorRecalculator and precursorRefiner! But I don't think those were ever extended to work with other vendors and TOFs. Waters really should have this fixed in their SDK.

pete-reay-waters commented 1 year ago

Thanks for the info @chambm I've added it to our backlog. I'm out of office next week, but we'll investigate.

Sanchezillana commented 1 year ago

OK yes I see the issue now. The DDA processing does seem to change the precursor m/z, but it doesn't seem to be fully lockmass correcting it. Is it supposed to do so @pete-reay-waters ?

Thank you guys @elnurgar @chambm @pete-reay-waters . I thouth that this wrong precursor ion mz was the center of the quadrupole isolation window instead of the correct MS1 exact mass, but... is it something related to the lockmass? What a mess. I will try your script @elnurgar with my data and give you feedback.

elnurgar commented 1 year ago

@Sanchezillana, I don't think that it is related to the lock mass. I don't know what is the issue with Waters, as it has a lock mass and calibrates spectra during the analysis. On Bruker Impact II, the calibration solution is injected in the beginning of each run, and Bruker calibrates spectra afer acquisition of each chromatogram. Therefore, the values of precursors retained for the fragmentation during the analysis can be drifted. However, after acquisition, when we vizualize spectra on vendor software we see the correct m/z precursor value and when we export to mzML or mzXML this value is not calibrated. However the error is not so high as with Waters data. With Bruker data we have 50-80 ppm drift, whereas with Waters data almost 1000 ppm.

Sanchezillana commented 1 year ago

@Sanchezillana, I don't think that it is related to the lock mass. I don't know what is the issue with Waters, as it has a lock mass and calibrates spectra during the analysis. On Bruker Impact II, the calibration solution is injected in the beginning of each run, and Bruker calibrates spectra afer acquisition of each chromatogram. Therefore, the values of precursors retained for the fragmentation during the analysis can be drifted. However, after acquisition, when we vizualize spectra on vendor software we see the correct m/z precursor value and when we export to mzML or mzXML this value is not calibrated. However the error is not so high as with Waters data. With Bruker data we have 50-80 ppm drift, whereas with Waters data almost 1000 ppm.

So, does anyone known the real reason? I found that answer by Waters support team months ago: https://support.waters.com/KB_Inf/MassLynx/WKB28313_Why_doesnt_the_MSMS_set_mass_in_the_header_of_the_spectrum_window_match_the_peak_mass?mt-learningpath=dda_qanda

I understood that was some problem related with the mass calibration and the way of MassLynxs saves the precursor ion, since it saves the quadrupole isolation window instead of the precursor exact mass measured in the survey MS1 scan.

Sanchezillana commented 1 year ago

@elnurgar I ran your program after removal (del /s command on cmd) of UV scans and msconvert convertion to mzML (CWT peakpeaking and Waters DDA filter). Aparently, I've have nice .mzML files when I opened it on MzMine3. I'll give you more feedback if I found something during data processing.

elnurgar commented 1 year ago

@Sanchezillana thank you for your first feedback. My colleague that works on Bruker Impact II told me recently that the error on his machine was also important at about 120 ppm. This problem can also occur sometimes on orbitrap, but rarely. No information about shimadzu or sciex.

pete-reay-waters commented 1 year ago

Hi @chambm regarding the two issues above, we have logged them and plan to fix them in the next version, which should be available by the summer.

We believe the cause of the second one is that the spectra have been lockmass corrected at source (before being processed by msconvert). The logic thus disables lockmass correction for the dataset. However, that also impacts the precursor m/z. We have plans to change the logic in the MassLynx SDK to allow correction of the precursor, even when the spectra are already corrected.

Sanchezillana commented 1 year ago

@elnurgar I ran your program after removal (del /s command on cmd) of UV scans and msconvert convertion to mzML (CWT peakpeaking and Waters DDA filter). Aparently, I've have nice .mzML files when I opened it on MzMine3. I'll give you more feedback if I found something during data processing.

I tried also to do the same at the files centroided with MassLynx (since the vendor peakpicking doesn't works well on msconvert for Waters and DDA correction). The mass that I obtained are not exactly the same if the peaks are centroided with msconvert CWT. More accurate if converted before with MassLynx.

tsufz commented 1 year ago

Hi @pete-reay-waters, thank you so much for the information. This action would be very appreciated by your user community. I talked to many people working in nontargeted analysis. They will be delighted, if Waters' mass spectra will be available for processing in their workflows.

Yours Tobias

GSH-09 commented 1 year ago

@Sanchezillana OK, MSConvertGUI is updated with a Waters DDA processing filter that will enable their newly added DDA processing mode. However it seems to do everything EXCEPT peak picking. I have to do CWT peak picking, then the results are good. Still waiting on Waters to tell me if the lack of peak picking is a bug.

Are there any special settings to correctly convert profile full scan data + DDA data with lockmass correction?

I exported .RAW from Unifi, used MSConvertGUI to create the mzML (downloaded yesterday; used "generic" and I didn't add any filters), and reviewed data using MzMine3. The full scan MS1 data was there, but the DDA MS2 data was not there. It was also unclear if MSConvert used lockmass data to correct for mass accuracy.

edit: looks like MS2 data is present, but are all being attributed to one incorrect precursor m/z (537.5 neg esi), at least when using MzMine3.

edit2: I tried Waters DDA Processing, but got an error: Failed - System.Exception: [pwiz::CLI::msdata::ReaderList::read] Unhandled exception: Incorrect acquisition type at pwiz.CLI.msdata.ReaderList.read(String filename, MSDataList results, ReaderConfig config) at MSConvertGUI.MainLogic.processFile(String filename, Config config, ReaderList readers, Map2 usedOutputFilenames) at MSConvertGUI.MainLogic.Go(Config config, Map2 usedOutputFilenames)

greencodefairy commented 1 year ago

Hi @pete-reay-waters, has there been some update regarding the Waters' issue(s) described in this thread? Yours, Laura

pete-reay-waters commented 1 year ago

Hi @pete-reay-waters, has there been some update regarding the Waters' issue(s) described in this thread? Yours, Laura

Hi Laura @greencodefairy

We're currently investigating the issue with vendor peak picking crashing on DDA data, this is on our Kanban board now. Re the other two issues I'll provide an update early next week; we haven't addressed them yet but they definitely haven't been forgotten, I just need to check with the revelevant person on Tuesday if we have any more information.

Kind regards Pete

pete-reay-waters commented 1 year ago

Hi, another brief update on this. We are now actively working on the crashes in centroiding of DDA data.

Re the other two mentioned above,

We'd welcome feedback on the second issue - is the workaround acceptable, or is this an essential fix for anyone?

septermus commented 1 year ago

Hi. Have just spent several month collecting a bunch of Synapt data for Metaboloics in what would seem to be the worst of all combnation (MS1, MS2, Centroid with on the fly Lockmass correction, plus UV, then found all the above issue! Do we have a fix from Waters on the problem?

pete-reay-waters commented 1 year ago

Hi @septermus we have fixes for several issues including those two above - the fixes will be coming in November

septermus commented 1 year ago

That will be great! My birthday is 3rd of November so I'll take it as a much welcome present to be able to get all my data analyzed properly! Will you be announcing it on this thread?

pete-reay-waters commented 1 year ago

Yes, once we've pushed the changes to pwiz's main branch we'll mention it in this thread.

Our expectation is the changes might be later in November than the 3rd, but hopefully you can still celebrate soon.

septermus commented 1 year ago

Great. In the meantime, don't suppose you would know why MassLynx 4.2 (which is for Win 10) won't plot TIC's of and chromatograms when files are loaded. I can see the spectra fine, do all manipulation but no line based plot are visible, the plot area is completely blank. Occurs for both centroid and profile data. Data is from Synapt GS and GSi and occurs for any file type MS, MS2, MSe. This happens on every Windows 10 PC I have tried (4 now!). Tried different screen resolutions but always the same. Data looks fine in MassLynx4.1 on Win 7 PC's. Any advice greatly appreciated. Regards Simon

pete-reay-waters commented 1 year ago

@septermus sorry I have not worked on the MassLynx application so I'll have to direct you to the normal support channels for MassLynx - hopefully they'll be able to get you sorted. Let me know if you need help finding that.

Sanchezillana commented 1 year ago

Do you know this program https://microapps.on-demand.waters.com/home/showmarkdown/data-as-a-product ? They claim that manage to convert waters .raw to MZml and fix the lockmass precursor issue.

I don't check it but maybe you can use their approach for correcting the issues.

Ángel SI

Sanchezillana commented 1 year ago

Hi @septermus we have fixes for several issues including those two above - the fixes will be coming in November

This sounds great! Thank you for your hard work on this!

pete-reay-waters commented 1 year ago

Do you know this program https://microapps.on-demand.waters.com/home/showmarkdown/data-as-a-product ? They claim that manage to convert waters .raw to MZml and fix the lockmass precursor issue.

I don't check it but maybe you can use their approach for correcting the issues.

Ángel SI

Thanks for sharing this link. The data-as-a-product app is made by the team I work on.

The data-as-a-product app actually uses msconvert internally to produce the mzML.

As a further layer, internally msconvert uses the Waters MassLynx SDK to read the Raw data.

In fact the changes we are talking about bringing into msconvert in November (in this thread) are already included in the data-as-a-product application, which has a private build of msconvert and the MassLynx SDK included. We are currently working on creating a new release of the MassLynx SDK, and will be submitting a PR to msconvert so that all users of msconvert get the benefits of these fixes.

I'd also mention some other value that the data-as-a-product app adds to the core functionality msconvert provides:

greencodefairy commented 11 months ago

Do you know this program https://microapps.on-demand.waters.com/home/showmarkdown/data-as-a-product ? They claim that manage to convert waters .raw to MZml and fix the lockmass precursor issue. I don't check it but maybe you can use their approach for correcting the issues. Ángel SI

Thanks for sharing this link. The data-as-a-product app is made by the team I work on.

The data-as-a-product app actually uses msconvert internally to produce the mzML.

As a further layer, internally msconvert uses the Waters MassLynx SDK to read the Raw data.

In fact the changes we are talking about bringing into msconvert in November (in this thread) are already included in the data-as-a-product application, which has a private build of msconvert and the MassLynx SDK included. We are currently working on creating a new release of the MassLynx SDK, and will be submitting a PR to msconvert so that all users of msconvert get the benefits of these fixes.

I'd also mention some other value that the data-as-a-product app adds to the core functionality msconvert provides:

* You can specify a quad isolation window to include in the mzML for DDA data, which some 3rd party software requires (it is down to the user to enter the lower and upper offset they wish to include in the file).

* Additional UI

  * Wizard for queueing existing injections
  * Show status of export queue
  * Supports sample list settings for acquire and process workflows in MassLynx

Hey @pete-reay-waters, thanks for sharing, I signed up and tried out Waters DataConnect by importing existing files and successfully converted .raw to .mzML. Comparing the file in MZmine 3 with the same file converted with another script Waters2mzML, it still does not assign MS2 precursor ions correctly (for all same m/z 625.00). Is this something, that you currently address?

Regards, Laura

greencodefairy commented 11 months ago

Do you know this program https://microapps.on-demand.waters.com/home/showmarkdown/data-as-a-product ? They claim that manage to convert waters .raw to MZml and fix the lockmass precursor issue. I don't check it but maybe you can use their approach for correcting the issues. Ángel SI

Thanks for sharing this link. The data-as-a-product app is made by the team I work on.

The data-as-a-product app actually uses msconvert internally to produce the mzML.

As a further layer, internally msconvert uses the Waters MassLynx SDK to read the Raw data.

In fact the changes we are talking about bringing into msconvert in November (in this thread) are already included in the data-as-a-product application, which has a private build of msconvert and the MassLynx SDK included. We are currently working on creating a new release of the MassLynx SDK, and will be submitting a PR to msconvert so that all users of msconvert get the benefits of these fixes.

I'd also mention some other value that the data-as-a-product app adds to the core functionality msconvert provides:

* You can specify a quad isolation window to include in the mzML for DDA data, which some 3rd party software requires (it is down to the user to enter the lower and upper offset they wish to include in the file).

* Additional UI

  * Wizard for queueing existing injections
  * Show status of export queue
  * Supports sample list settings for acquire and process workflows in MassLynx

Hi @pete-reay-waters, converting existing .raw files works with Waters DataConnect mentioned above. However, when I compare this with Waters2mzML V1.2.0, then the same result is achieved. I was hoping that the issue of the precursor ion would be solved with this, as now for both scripts MS2 level gets the same ion assigned for all (m/z 625). Is this smething you are currently working on?

Regards, Laura

chambm commented 11 months ago

It sounds like you might be looking at MSe data or possibly targeted MS2 data (i.e. inclusion list). Doesn't sound like DDA if all the precursors are round numbers like 625.00. Can you share an example file?

greencodefairy commented 11 months ago

that is true, I am using MSe data. DDA is not working for my experimental approach.

chambm commented 11 months ago

Then you can basically ignore the precursor m/z. It's meaningless because the entire scan range was fragmented (for a high energy scan).

greencodefairy commented 11 months ago

but it will work now with MS2 data? Because I also have one set, I could test then.