compomics / ms2pip

MS²PIP: Fast and accurate peptide spectrum prediction for multiple fragmentation methods, instruments, and labeling techniques.
https://ms2pip.readthedocs.io
Apache License 2.0
35 stars 18 forks source link

fasta2speclib: ValueError: cannot convert float NaN to integer #215

Closed kgousset closed 5 months ago

kgousset commented 5 months ago

Hi,

Thanks for the great software. I am trying to convert a Mus musculus Uniprot/SwissProt FASTA to a spectral library using fasta2speclib. I am on a Threadripper 5995WX with 512GB of RAM. This is being processed in WSL2 using Ubuntu 22.04.3 on Windows 11. I use pyenv virtualenv and have tried every python from 3.7 to 3.12.2.

I tried standard pip installation of ms2pip and deeplc. I have also tried it with the minimum requirements in your pyproject.toml. I also tried every DeepLC version from 0.1.14 to the latest with a bunch of different combinations of numpy and pandas with the latest ms2pip.

Unfortunately, nothing has worked. I always get the same error but it never seems to error out on the same batch. Sometimes it errors out early, usually the 9th or 10th batch out of 320, but it often goes over the 100th batch. The furthest it has gone is the 143rd batch.

The FASTA It is a standard UNIPROT FASTA appended with a cRAP contaminants FASTA (I have attached it as a .txt below since I can't attach it as a .fasta. I have used py-fasta-validator to valid it and no problems were detected.

The configuration I used is below. Do you know why I might be having this issue? Any help would be greatly appreciated.

Thanks in advance for any help you might offer.

Best, Karine

UP000000589_uniprotkb_swissprot_crap_mus_musculus_10090_2024_02_29.txt

{ "output_filetype":["msp"], "charges":[1, 2, 3, 4], "min_peplen":7, "max_peplen":50, "cleavage_rule":"trypsin", "missed_cleavages":2, "semi_specific":false, "modifications":[{"name":"Carbamidomethyl", "unimod_accession":4, "mass_shift":57.021464, "amino_acid":"C", "fixed":false}], "max_variable_modifications":5, "min_precursor_mz":99.0, "max_precursor_mz":1700.0, "ms2pip_model":"timsTOF", "add_decoys":true, "add_retention_time":true, "deeplc": {}, "batch_size":10000, "num_cpu":120 }

Traceback (most recent call last):

File "/home/kgousset/.pyenv/versions/MS2PIP2/bin/fasta2speclib", line 8, in sys.exit(main()) ^^^^^^ File "/home/kgousset/.pyenv/versions/3.11.8/envs/MS2PIP2/lib/python3.11/site-packages/fasta2speclib/fasta2speclib.py", line 697, in main f2sl.run() File "/home/kgousset/.pyenv/versions/3.11.8/envs/MS2PIP2/lib/python3.11/site-packages/fasta2speclib/fasta2speclib.py", line 243, in run self.process_batch(batch_id, batch_peptides) File "/home/kgousset/.pyenv/versions/3.11.8/envs/MS2PIP2/lib/python3.11/site-packages/fasta2speclib/fasta2speclib.py", line 366, in process_batch self._write_predictions( File "/home/kgousset/.pyenv/versions/3.11.8/envs/MS2PIP2/lib/python3.11/site-packages/fasta2speclib/fasta2speclib.py", line 627, in _write_predictions spec_out.write_msp() File "/home/kgousset/.pyenv/versions/3.11.8/envs/MS2PIP2/lib/python3.11/site-packages/ms2pip/ms2pip_tools/spectrum_output.py", line 29, in wrapper return self._write_general(write_function, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/kgousset/.pyenv/versions/3.11.8/envs/MS2PIP2/lib/python3.11/site-packages/ms2pip/ms2pip_tools/spectrum_output.py", line 612, in _write_general write_function(self, file_object) File "/home/kgousset/.pyenv/versions/3.11.8/envs/MS2PIP2/lib/python3.11/site-packages/ms2pip/ms2pip_tools/spectrum_output.py", line 369, in write_msp self._get_peak_string( File "/home/kgousset/.pyenv/versions/3.11.8/envs/MS2PIP2/lib/python3.11/site-packages/ms2pip/ms2pip_tools/spectrum_output.py", line 238, in _get_peak_string f'{peak[1]:.6f}{sep}{intensity_type(peak[2])}{sep}"{ion_type.lower()}{peak[0]}/0.0"', ^^^^^^^^^^^^^^^^^^^^^^^ ValueError: cannot convert float NaN to integer


pyenv install 3.11.8

`Package Version


absl-py 2.1.0 astunparse 1.6.3 biopython 1.83 blosc2 2.5.1 cachetools 5.3.3 certifi 2024.2.2 charset-normalizer 3.3.2 click 8.1.7 contourpy 1.2.0 cycler 0.12.1 deeplc 2.2.32 deeplcretrainer 0.2.11 flatbuffers 23.5.26 fonttools 4.49.0 gast 0.4.0 google-auth 2.28.1 google-auth-oauthlib 1.0.0 google-pasta 0.2.0 greenlet 3.0.3 grpcio 1.62.0 h5py 3.10.0 hdf5plugin 4.4.0 idna 3.6 jax 0.4.25 keras 2.12.0 kiwisolver 1.4.5 libclang 16.0.6 llvmlite 0.42.0 lxml 5.1.0 Markdown 3.5.2 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.8.3 mdurl 0.1.2 ml-dtypes 0.3.2 ms2pip 3.12.0 msgpack 1.0.8 ndindex 1.8 numba 0.59.0 numexpr 2.9.0 numpy 1.24.3 oauthlib 3.2.2 opt-einsum 3.3.0 packaging 23.2 pandas 1.5.3 pillow 10.2.0 pip 24.0 progressbar2 4.4.1 protobuf 4.25.3 psims 1.3.3 psm-utils 0.7.2 py-cpuinfo 9.0.0 pyasn1 0.5.1 pyasn1-modules 0.3.0 pydantic 1.10.14 pygam 0.9.0 Pygments 2.17.2 pyopenms 3.1.0 pyparsing 3.1.1 pyteomics 4.7.1 python-dateutil 2.9.0.post0 python-utils 3.8.2 pytz 2024.1 requests 2.31.0 requests-oauthlib 1.3.1 rich 13.7.1 rsa 4.9 scipy 1.12.0 setuptools 65.5.0 six 1.16.0 spectrum-utils 0.3.5 SQLAlchemy 1.4.51 tables 3.9.2 tensorboard 2.12.3 tensorboard-data-server 0.7.2 tensorflow 2.12.1 tensorflow-estimator 2.12.0 tensorflow-io-gcs-filesystem 0.36.0 termcolor 2.4.0 tomlkit 0.12.4 tqdm 4.66.2 typing_extensions 4.5.0 urllib3 2.2.1 Werkzeug 3.0.1 wheel 0.42.0 wrapt 1.14.1 xgboost 1.7.6`

kgousset commented 5 months ago

I finally got it to work with python 3.11.8, checking out and git installing v3.13.0 of MS2PIP, with DeepLC v2.2.32. Thanks.

RalfG commented 5 months ago

Hi @kgousset,

Happy the issue was resolved. I'm not sure how it originated in the first place, and how the update to v3.13.0 fixed the issue. Do let us know if you run into any (other) issues.

Best, Ralf