IEDB / PEPMatch

Other
17 stars 1 forks source link

Pandas data type exception: trying to get an integer from '' #14

Open patrick-willems opened 5 months ago

patrick-willems commented 5 months ago

Best,

Thank you for generating this tool. I am very keen to use pepmatch to speed up some data analysis, however on two different occassions I have been having this issue:

prc@ProtLinux:~/Documents/casanovo$ pepmatch-match -q test.fasta -p human_Listeria_cont_tda.fasta -m 0 -k 5
Traceback (most recent call last):
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/arrays/integer.py", line 51, in _safe_cast
    return values.astype(dtype, casting="safe", copy=copy)
TypeError: Cannot cast array data from dtype('O') to dtype('int64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/prc/.local/bin/pepmatch-match", line 8, in <module>
    sys.exit(run_matcher())
  File "/home/prc/.local/lib/python3.8/site-packages/pepmatch/shell.py", line 58, in run_matcher
    Matcher(**matcher_args).match()
  File "/home/prc/.local/lib/python3.8/site-packages/pepmatch/matcher.py", line 217, in match
    query_df = self._dataframe_matches(self.exact_match_search())
  File "/home/prc/.local/lib/python3.8/site-packages/pepmatch/matcher.py", line 842, in _dataframe_matches
    df[int_cols] = df[int_cols].astype('Int64')
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 6233, in astype
    results = [
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 6234, in <listcomp>
    self.iloc[:, i].astype(dtype, copy=copy)
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/generic.py", line 6240, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 448, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 352, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 526, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 230, in astype_array
    values = astype_nansafe(values, dtype, copy=copy)
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/dtypes/astype.py", line 95, in astype_nansafe
    return dtype.construct_array_type()._from_sequence(arr, dtype=dtype, copy=copy)
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/arrays/masked.py", line 132, in _from_sequence
    values, mask = cls._coerce_to_array(scalars, dtype=dtype, copy=copy)
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/arrays/numeric.py", line 258, in _coerce_to_array
    values, mask, _, _ = _coerce_to_data_and_mask(
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/arrays/numeric.py", line 214, in _coerce_to_data_and_mask
    values = dtype_cls._safe_cast(values, dtype, copy=False)
  File "/home/prc/.local/lib/python3.8/site-packages/pandas/core/arrays/integer.py", line 53, in _safe_cast
    casted = values.astype(dtype, copy=copy)
ValueError: invalid literal for int() with base 10: ''

My test.fasta is a simple peptide fasta file in the form

>peptide1
peptide1
>peptide2
peptide2

Its probably a stupid input mistake of my fasta?

Thanks in advance, Patrick

danielmarrama commented 4 months ago

@patrick-willems Thanks for trying out the tool! Also, sorry for the late reply, I unfortunately didn't receive an alert about this. Which version of PEPMatch are you using? I remember this being a bug a little ways ago, but it should've been patched with v1.0.2 I believe. If you update with pip and still get the same error, please let me know as soon as you can so I can look into it.