compomics / ms2rescore

Modular and user-friendly platform for AI-assisted rescoring of peptide identifications
https://ms2rescore.readthedocs.io
Apache License 2.0
49 stars 15 forks source link

PIN pipeline title matching: `ValueError: cannot convert float NaN to integer` #54

Closed thaumP closed 11 months ago

thaumP commented 2 years ago

ms2rescore -m /mnt/d/protome/P20210803783/P20210803783-P1_Slot1-73_1_4257_uncalibrated.mgf -t ./ms2r -o ms2r_P1_MH-40G-300-50 -n 12 P20210803783-P1_Slot1-73_1_4257.pin

/home/llt/.local/lib/python3.8/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.

from pandas import MultiIndex, Int64Index 2022-03-18 16:29:40 // INFO // ms2rescore // Using PinPipeline. /home/llt/miniconda3/envs/ms2r/lib/python3.8/site-packages/ms2rescore/percolator.py:154: UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract. self.df["Peptide"].str.contains(r"[([^[^]]*)]", regex=True) 2022-03-18 16:29:43 // ERROR // ms2rescore.main // Critical error occured in MS2ReScore Traceback (most recent call last): File "/home/llt/miniconda3/envs/ms2r/lib/python3.8/site-packages/ms2rescore/main.py", line 15, in main rescore.run() File "/home/llt/miniconda3/envs/ms2r/lib/python3.8/site-packages/ms2rescore/init.py", line 233, in run peprec = self.pipeline.get_peprec() File "/home/llt/miniconda3/envs/ms2r/lib/python3.8/site-packages/ms2rescore/id_file_parser.py", line 224, in get_peprec return self.peprec_from_pin() File "/home/llt/miniconda3/envs/ms2r/lib/python3.8/site-packages/ms2rescore/id_file_parser.py", line 179, in peprec_from_pin peprec = self.original_pin.to_peptide_record( File "/home/llt/miniconda3/envs/ms2r/lib/python3.8/site-packages/ms2rescore/percolator.py", line 470, in to_peptide_record peprec_df["spec_id"] = self._get_spectrum_index_column( File "/home/llt/miniconda3/envs/ms2r/lib/python3.8/site-packages/ms2rescore/percolator.py", line 268, in _get_spectrum_index_column id_col = self.df["SpecId"].str.extract(pattern, expand=False).astype(int) File "/home/llt/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 5920, in astype new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors) File "/home/llt/.local/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 419, in astype return self.apply("astype", dtype=dtype, copy=copy, errors=errors) File "/home/llt/.local/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 304, in apply applied = getattr(b, f)(**kwargs) File "/home/llt/.local/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 580, in astype new_values = astype_array_safe(values, dtype, copy=copy, errors=errors) File "/home/llt/.local/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1292, in astype_array_safe new_values = astype_array(values, dtype, copy=copy) File "/home/llt/.local/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1237, in astype_array values = astype_nansafe(values, dtype, copy=copy) File "/home/llt/.local/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1154, in astype_nansafe return lib.astype_intsafe(arr, dtype) File "pandas/_libs/lib.pyx", line 668, in pandas._libs.lib.astype_intsafe ValueError: cannot convert float NaN to integer

I create python 3.8 conda environment, conda installs wxPython, pip installs MS²Rescore “pandas.Int64Index is deprecated” just run the software and it will generate.

RalfG commented 2 years ago

Hi @thaumP,

The Pandas deprecation warning is an upstream issue in the XGBoost package (https://github.com/dmlc/xgboost/issues/7593), and out of our control. It should be fixed in the next XGBoost release. Until then, it is just a warning, no issue.

With regard to the ValueError: cannot convert float NaN to integer issue: It seems that MS²Rescore cannot properly extract MGF spectrum titles from the PIN file. The current method to match the PIN SpecId to the MGF title is hardcoded, so we might need to allow for some more customization on our end. Could you share an example of the SpecId column in your PIN file?

Thanks, Ralf

thaumP commented 2 years ago

Hi @thaumP,

The Pandas deprecation warning is an upstream issue in the XGBoost package (dmlc/xgboost#7593), and out of our control. It should be fixed in the next XGBoost release. Until then, it is just a warning, no issue.

With regard to the ValueError: cannot convert float NaN to integer issue: It seems that MS²Rescore cannot properly extract MGF spectrum titles from the PIN file. The current method to match the PIN SpecId to the MGF title is hardcoded, so we might need to allow for some more customization on our end. Could you share an example of the SpecId column in your PIN file?

Thanks, Ralf

Hi Ralf, thank you for your reply.

I used msfragger search the timsTOF data to generate .pin files. There is a example specID column.

SPECID.txt