QuantSysBio / inSPIRE

in silico Spectral Predictor Informed REscoring
GNU General Public License v2.0
20 stars 1 forks source link

Divide by zero error #19

Closed ericmalekos closed 1 year ago

ericmalekos commented 1 year ago

I'm running inSPIRE on Maxquant output (FDR 1, default settings) but am running into a divide by zero error. It seems like the first file is processed without error, but the error is occurring on the second mzML file?

I generate mzML from RAW files with proteowizard (peak calling option set). Any ideas? I paste my config.yml input at the end.

---> Running inSPIRE version 1.4 <---

Checking for required inSPIRE models...
    Models already downloaded.
Creating Formatted Spectral Prediction Input...
    Formatted Prosit input written.
Predicting Spectra...
1777/1777 [==============================] - 782s 440ms/step
1777/1777 [==============================] - 1361s 766ms/step
Generating Features for Percolator Input...
    Filtered 33009 PSMs due to modifications unknown to Prosit.
    Filtered 874593 PSMs due to unmodified cysteines.
    MS Search Results ready.
    Basic Features added, adding Spectral Features.
        Processing scan file 0.
            124164 in original search results.
            124164 after combination with predicted spectra.
            124164 after combination with experimental spectra.
            Combined DB Search, Spectral, and Prosit Data.
            Created Spectral and Delta RT Features.
        Processing scan file 1.
            99258 in original search results.
            99258 after combination with predicted spectra.
            99258 after combination with experimental spectra.
            Combined DB Search, Spectral, and Prosit Data.
/home/eric/miniconda3/envs/inspire/lib/python3.9/site-packages/inspire/spectral_features.py:395: RuntimeWarning: divide by zero encountered in double_scalars
  df_row['spectrumDensity'] = len(df_row[MZS_KEY])/(df_row[MZS_KEY].max() - df_row[MZS_KEY].min())
/home/eric/miniconda3/envs/inspire/lib/python3.9/site-packages/inspire/spectral_features.py:395: RuntimeWarning: divide by zero encountered in double_scalars
  df_row['spectrumDensity'] = len(df_row[MZS_KEY])/(df_row[MZS_KEY].max() - df_row[MZS_KEY].min())
/home/eric/miniconda3/envs/inspire/lib/python3.9/site-packages/inspire/spectral_features.py:395: RuntimeWarning: divide by zero encountered in double_scalars
  df_row['spectrumDensity'] = len(df_row[MZS_KEY])/(df_row[MZS_KEY].max() - df_row[MZS_KEY].min())
experimentTitle: Rescore
searchResults: ./msms.txt
searchEngine: maxquant
outputFolder: ./output
scansFolder: ./
scansFormat: mzML
collisionEnergy: 33
rescoreMethod: mokapot
mzAccuracy: 0.02
deltaMethod: ignore
nCores: 1
jamc1996 commented 1 year ago

Hi Eric,

Thanks for using inSPIRE! It seems like for that divide by 0 error must be caused if you have a single m/z in a scan somehow. I've updated inSPIRE so it avoids this case now.

I've updated the source on GitHub to address the issue but is not yet available on pip (or release of inSPIRE 1.5 is still a few days away so I will come back here when it's released) but you can still install the update by following the steps below:

Uninstall your existing installation:

pip uninstall inspirems

Clone the GitHub repo:

git clone https://github.com/QuantSysBio/inSPIRE.git

Change directory into inSPIRE:

cd inSPIRE

Install the cloned repo:

python setup.py install

Thanks again and let me know if there are still any issues.

Best, John.

ericmalekos commented 1 year ago

Thanks John, this ran without error now. Do you know what might cause a single m/z in a scan and how does this relate to the program reporting 99258 in original search results.?

I'm new to working with proteomics data and wonder if this could be from the conversion from RAW to mzML? Or is it representative of experimental failure? Or ... ?

jamc1996 commented 1 year ago

Hi Eric,

Sorry I missed the notification of your response. Great that it works now, thanks for helping to improve our software!

When I say a single m/z in a scan, I mean that in one of those 99,258 MS2 spectra there is just one single peak detected, which caused inSPIRE to break. This does seem very unusual to me but maybe it's just an edge case we hadn't come across before.

Conversion from RAW can also certainly be an issue, we've seen it with a different software in the past but I think proteowizard is very reliable unless there was some mistake in the peak picking option.

If you're worried about it and working with data from a ThermoFisher machine I mostly use this tool now, which defaults to peak picking from native Thermo library:

https://github.com/compomics/ThermoRawFileParser

It's executed via terminal like inSPIRE and I find it very useful.

All the best, John.

jamc1996 commented 1 year ago

Hi Eric,

I'm closing this issue as the version containing the fix (inSPIRE 1.5) is now published on pip meaning it can now be installled via the standard:

pip install inspirems==1.5

This does not impact you if you successfully installed inSPIRE manually via the instructions above.

Thank you for your help improving inSPIRE. We hope to hear from you again!

Best wishes, John.