andersen-lab / Freyja

Depth-weighted De-Mixing
BSD 2-Clause "Simplified" License
100 stars 29 forks source link

Freyja covariants runtime error #226

Closed RainerWaldmann closed 1 month ago

RainerWaldmann commented 3 months ago

freyja covariants -> IndexError: string index out of range error

Freya v 1.5, conda install into clean new environment. Python 3.10.14

Input bam file: minimap2 mapped Nanopore reads. Primers were trimmed with custom software (soft clipping of sam records). Custom software was used for primer trimming since iVar did not work with the multiple overlapping primer panels we used. The bam file worked fine with other applications and displays correctly in IGV.

freyja covariants ./2021/barcode13TrimmedSorted.bam 21563 25384 --ref-genome /data/analysis/data_rainer/rainer/REFERENCES/SARS-CoV-2.reference.fasta --gff-file /data/analysis/data_rainer/rainer/REFERENCES/GCF_009858895.2_ASM985889v3_genomic.gff  --output BC13_covariants.tsv
Traceback (most recent call last):
  File "/data/analysis/data_rainer/conda/wastewater/bin/freyja", line 10, in <module>
    sys.exit(cli())
  File "/data/analysis/data_rainer/conda/wastewater/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/data/analysis/data_rainer/conda/wastewater/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/data/analysis/data_rainer/conda/wastewater/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/analysis/data_rainer/conda/wastewater/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/analysis/data_rainer/conda/wastewater/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/data/analysis/data_rainer/conda/wastewater/lib/python3.10/site-packages/freyja/_cli.py", line 872, in covariants
    _covariants(input_bam, min_site, max_site, output,
  File "/data/analysis/data_rainer/conda/wastewater/lib/python3.10/site-packages/freyja/read_analysis_tools.py", line 493, in covariants
    if seq[read_site] != 'N' and ref_base != seq[read_site]:
IndexError: string index out of range

I get the same error if I don't supply the ref and gff: freyja covariants ./2021/barcode13TrimmedSorted.bam 21563 25384 --output BC13_covariants.tsv

dylanpilz commented 3 months ago

Hey @RainerWaldmann,

This seems to be an issue related to pysam reading your input data, but without the actual files there's not much more I can say. If possible, could you send the bam files to me via dpilz@scripps.edu and I'll try to reproduce the error?

dylanpilz commented 3 months ago

After taking a closer look, it seems the issue is arising from there being hard and softclipped bases in the cigar string for some reads in your data.

I'll make some changes to make covariants more flexible for this kind of input data.