algaebrown / circSTAMP_pipe

0 stars 0 forks source link

Error in rule edit_aggregate_pseudoreference: #4

Open byee4 opened 4 weeks ago

byee4 commented 4 weeks ago

Could this be a version issue with pandas? I'm using pandas 2.2.2 and numpy 2.0.0

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 2
Rules claiming more threads will be scaled down.
Select jobs to execute...

[Sun Aug 18 17:44:14 2024]
rule edit_aggregate_pseudoreference:
    input: output/edits/siRBM15_1.dp4.pos.vcf.tsv
    output: output/edits/siRBM15_1.dp4.pos.vcf.aggregated.tsv, output/edits/siRBM15_1.dp4.pos.vcf.aggregated.nonzero.tsv
    jobid: 0
    benchmark: benchmarks/aggregate_pseudoreference.siRBM15_1.pos.txt
    wildcards: sample_label=siRBM15_1, strand=pos
    resources: mem_mb=16000, disk_mb=51797, tmpdir=/tmp, cpus=1, runtime=60, nodes=1, partition=condo

        python /tscc/projects/ps-yeolab4/software/circstamp/a7c4316/bin/circSTAMP_pipe/scripts//aggregate_pseudoreference_edit.py             output/edits/siRBM15_1.dp4.pos.vcf.tsv T output/edits/siRBM15_1.dp4.pos.vcf.aggregated.tsv

Activating conda environment: ../../../../../../../projects/ps-yeolab4/software/circstamp/c89ea98/bin/snakeconda/25058cd165ecd006101405d3048e5d16
Traceback (most recent call last):
  File "/tscc/projects/ps-yeolab4/software/circstamp/a7c4316/bin/circSTAMP_pipe/scripts//aggregate_pseudoreference_edit.py", line 38, in <module>
    for df in pd.read_csv(fname,
  File "/tscc/projects/ps-yeolab4/software/miniconda_tscc2/envs/circstamp-a7c4316/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1843, in __next__
    return self.get_chunk()
  File "/tscc/projects/ps-yeolab4/software/miniconda_tscc2/envs/circstamp-a7c4316/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1985, in get_chunk
    return self.read(nrows=size)
  File "/tscc/projects/ps-yeolab4/software/miniconda_tscc2/envs/circstamp-a7c4316/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1923, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/tscc/projects/ps-yeolab4/software/miniconda_tscc2/envs/circstamp-a7c4316/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "parsers.pyx", line 850, in pandas._libs.parsers.TextReader.read_low_memory
  File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows
  File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
  File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
  File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 10 fields in line 42834609, saw 19

[Sun Aug 18 17:46:35 2024]
Error in rule edit_aggregate_pseudoreference:
    jobid: 0
    output: output/edits/siRBM15_1.dp4.pos.vcf.aggregated.tsv, output/edits/siRBM15_1.dp4.pos.vcf.aggregated.nonzero.tsv
    conda-env: /tscc/projects/ps-yeolab4/software/circstamp/c89ea98/bin/snakeconda/25058cd165ecd006101405d3048e5d16
    shell:

        python /tscc/projects/ps-yeolab4/software/circstamp/a7c4316/bin/circSTAMP_pipe/scripts//aggregate_pseudoreference_edit.py             output/edits/siRBM15_1.dp4.pos.vcf.tsv T output/edits/siRBM15_1.dp4.pos.vcf.aggregated.tsv

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job edit_aggregate_pseudoreference since they might be corrupted:
output/edits/siRBM15_1.dp4.pos.vcf.aggregated.tsv, output/edits/siRBM15_1.dp4.pos.vcf.aggregated.nonzero.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
byee4 commented 3 weeks ago

It turns out that output/edits/siRBM15_1.dp4.pos.vcf.tsv contained a line that should be two lines:

chr1:174252480|174278779        48611   C       <*>     PASS    SYMBOLIC        0       0,0     0,0     0,0
chr1:174252480|174278779        48624   C       <*>     PASS    SYMBOLIC        0       0,0     0,0     0,0
chr1:174252480|174278779        48635   C       <*>     PASS    SYMBOLIC        0       0,0     0,0     0,0
chr1:174252480|174278779        48636   C       <*>     PASS    SYMBOLIC        0       0,0     0,0     0,chr1:54036610|54048682        1625    C       <*>     PASS    SYMBOLIC        1       1,0     1,0     0,0
chr1:54036610|54048682  1632    C       <*>     PASS    SYMBOLIC        1       1,0     1,0     0,0
chr1:54036610|54048682  1633    C       <*>     PASS    SYMBOLIC        1       1,0     1,0     0,0
chr1:54036610|54048682  1641    C       <*>     PASS    SYMBOLIC        1       1,0     1,0     0,0
chr1:54036610|54048682  1665    C       <*>     PASS    SYMBOLIC        1       1,0     1,0     0,0
chr1:54036610|54048682  1668    C       <*>     PASS    SYMBOLIC        1       1,0     1,0     0,0
chr1:54036610|54048682  1671    C       <*>     PASS    SYMBOLIC        1       1,0     1,0     0,0