OpenMS / pyopenms-docs

pyOpenMS readthedocs documentation, additional utilities, addons, scripts, and examples.
https://pyopenms.readthedocs.io
Other
45 stars 54 forks source link

Memory errors when exporting GNPS files #377

Closed wkindschuh closed 1 year ago

wkindschuh commented 1 year ago

Describe the problem you encountered I am experiencing segmentation faults and memory errors when I attempt to generate GNPS ready input files from a dataset containing technical replicates where some replicates contain only MS1 scans while others contain MS1 and data-dependent MS2 scans. I believe the issue is with the exporting of the MGF file, as I am able to generate a filtered features table of only features with supporting MS2 data. Also, when I only process the replicates containing MS1 and data dependent MS2 data, I am able to generate GNPS ready input files.

To Reproduce Steps to reproduce the behavior:

  1. Run centroiding on profile mode input files
  2. Run mass trace, elution peak, and features detection
  3. Align feature retention times
  4. Align mzML files based on FeatureMap alignment
  5. Map MS2 spectra to features
  6. Link features in a ConsensusMap
  7. Export files for GNPS

What should be happening I believe I should be able to generate a GNPS ready feature table and MGF file

System information:

Any advice/support would be much appreciated. If I am posting this in the wrong place or if there is additional documentation/information I can provide please let me know.

axelwalter commented 1 year ago

Thanks for the nice description, I was able to reproduce the error.

axelwalter commented 1 year ago

@wkindschuh can you please check if the issue persists with the latest pyopenms?

pip install --index-url https://pypi.cs.uni-tuebingen.de/simple/ pyopenms-nightly

In that version it works on my side.

wkindschuh commented 1 year ago

Unfortunately I am still getting memory errors and segmentation faults when I use the latest pyopenms. Also, I am getting this error both when I run pyopenms on my own Mac as well as a Linux based cluster. If I shared a few of my files would you be able to confirm whether the error persists if you attempt to generate GNPS input files from my data?

axelwalter commented 1 year ago

That would be helpful, I will check with your files.

wkindschuh commented 1 year ago

Thank you for offering to check with my files! I just shared a dropbox link with some of my data with axel.walter@uni-tuebingen.de

Please just let me know if there is an alternative email you would prefer I share the data with or if there is another way you would like for me to share the data with you.

Thanks!

axelwalter commented 1 year ago

Thanks for the files. Since they are mzXML files I had to convert them to mzML prior to running the GNPSMGFFile but besides that there were no issues. Can you try to do that as well? Just storing them as mzML files with MzMLFile().store(). Otherwise can you share your pyopenms code also via Dropbox?

axelwalter commented 1 year ago

For data processing I used the UmetFlow pyOpenMS notebooks. With adaptations to the mzXML file format in the pre-processing notebook.

wkindschuh commented 1 year ago

Thanks for pointing me to the umetaflow workflow / notebooks! I have tried to use these notebooks (with minor adaptations) in order to generate GNPS ready input files but am still getting errors when trying to generate the MGF file unfortunately. I just uploaded a notebook ("wkindschuh_preprocessing.ipynb") to the dropbox folder that I previously shared with you. I generated this notebook using code from the umetaflow notebooks 1_FileConversion, 2_Preprocessing, and 4_GNPSExport and have been running it inside the umetaflow repo after moving my mzXML files to data/raw. If you have any issues with running the notebook please just let me know. Any additional feedback or advice you have on how I am preprocessing my data would also be much appreciated! Thank you for helping me try to get to the bottom of my issue with generating GNPS input files.

axelwalter commented 1 year ago

Thanks for the notebook. Just ran it with your data without any issues. If you want we can take a look at it together, just sent you an Email to the Dropbox address.

wkindschuh commented 1 year ago

Hi Axel,

Thank you so much for offering to look together at the code. What is your availability like for Thursday (4/13)? I am flexible so whenever you are able to meet over zoom just let me know and I will accommodate your schedule.

Best, William Kindschuh

On Tue, Apr 11, 2023 at 3:20 AM axelwalter @.***> wrote:

Thanks for the notebook. Just ran it with your data without any issues. If you want we can take a look at it together, just sent you an Email to the Dropbox address.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

andrewjkwok commented 1 year ago

Hi - I am experiencing basically this exactly, with the latest pyopenms nightly version installed. (via: pip install --index-url https://pypi.cs.uni-tuebingen.de/simple/ pyopenms-nightly)

Was there any update on whether this had been fixed? Many thanks in advance.

axelwalter commented 1 year ago

Hi @andrewjkwok
This issue seems to a be a bit elusive, I did not experience it anymore. Could you send me an email and provide some example files where you experience the issue?

jpfeuffer commented 1 year ago

@axelwalter can you link the PR that was supposed to fix this? Just for reference.

andrewjkwok commented 1 year ago

Thanks @axelwalter, have just sent the email with a shared google drive folder for data files which hopefully allow the error to be reproduced. Please let me know if I can provide any other info to help.

axelwalter commented 1 year ago

The example data files mix positive and negative mode files. This seems to be causing issues with GNPSExport. Handling them separately works fine on my side.

axelwalter commented 1 year ago

@jpfeuffer There was no PR to fix it, worked fine for all I know except for rare cases which are hard to catch.

jpfeuffer commented 1 year ago

I see! Makes sense. Let's hope you can find something with the new data!

andrewjkwok commented 1 year ago

@axelwalter Thanks for testing this - I was successful handling the positive files alone but got the same error with the negative mode files. Could there be something wrong with the files, or is the issue with how the software handles negative mode files?

axelwalter commented 1 year ago

The issue was solved by setting the correct parameters for adduct detection in negative mode.