Sendrowski / fastDFE

Fast and flexible inference of the distribution of fitness effects (DFE), VCF-SFS parsing with ancestral allele and site-degeneracy annotation.
https://fastdfe.readthedocs.io
GNU General Public License v3.0
10 stars 0 forks source link

Error while parsing a VCF #6

Open AudeCaizergues opened 3 hours ago

AudeCaizergues commented 3 hours ago

Hello,

I'm trying to parse a VCF to build SFSs, but I got an error with the parsing step. My code is the following:

>>> p = fd.Parser(
...     n=8,
...     vcf="ALL137_filterbiSNPs_nonZ_renamed.vcf.gz",
...     fasta="GCF_003259725.1_athCun1_genomic.fna.gz",
...     gff="GCF_003259725.1_athCun1_genomic.fna.gz",
...     skip_non_polarized = False,
...     annotations=[
...         fd.DegeneracyAnnotation()
...     ],
...     stratifications=[fd.DegeneracyStratification()],
... )
>>> spectra: fd.Spectra = p.parse()

And the error:


INFO:Parser: Using stratification: [neutral, selected].
INFO:Parser: Loading VCF file
INFO:Parser: Loading GFF file
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/aude/opt/miniconda3/envs/fastdfe/lib/python3.12/site-packages/fastdfe/parser.py", line 1195, in parse
    self._setup()
  File "/Users/aude/opt/miniconda3/envs/fastdfe/lib/python3.12/site-packages/fastdfe/parser.py", line 1152, in _setup
    annotation._setup(self)
  File "/Users/aude/opt/miniconda3/envs/fastdfe/lib/python3.12/site-packages/fastdfe/annotation.py", line 214, in _setup
    self._handler._cds
  File "/Users/aude/opt/miniconda3/envs/fastdfe/lib/python3.12/functools.py", line 993, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/aude/opt/miniconda3/envs/fastdfe/lib/python3.12/site-packages/fastdfe/io_handlers.py", line 451, in _cds
    return self._load_cds()
           ^^^^^^^^^^^^^^^^
  File "/Users/aude/opt/miniconda3/envs/fastdfe/lib/python3.12/site-packages/fastdfe/io_handlers.py", line 477, in _load_cds
    df = pd.read_csv(
         ^^^^^^^^^^^^
  File "/Users/aude/opt/miniconda3/envs/fastdfe/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/aude/opt/miniconda3/envs/fastdfe/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 626, in _read
    return parser.read(nrows)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/aude/opt/miniconda3/envs/fastdfe/lib/python3.12/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/aude/opt/miniconda3/envs/fastdfe/lib/python3.12/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 921, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 983, in pandas._libs.parsers.TextReader._convert_column_data
pandas.errors.ParserError: Too many columns specified: expected 9 and found 1

Do you know where it could come from ? Thank you,

Aude

Sendrowski commented 2 hours ago

Hey Aude, the gff argument only accepts a GFF file. I suppose you accidentally specified the reference FASTA file instead.

Janek