CSB5 / lofreq

LoFreq Star: Sensitive variant calling from sequencing data
http://csb5.github.io/lofreq/
Other
100 stars 31 forks source link

Problem of forget to indel alignment-quality to your bam-file #133

Open Xiang-Leo opened 1 year ago

Xiang-Leo commented 1 year ago

Hi, I met the problem of WARNING(lofreq_call.c|main_call): 5 indel calls (before filtering) were made without indel alignment-quality! Did you forget to indel alignment-quality to your bam-file? when I ran lofreq within Snakemake. Here are the rules:

rule indelqual:
    input:
        bam = "03_map2covid.sort.bam"
    output:
        "05_map2covid.lofreq.indelqual.bam"
    params:
        ref = "NC_045512.2.fasta"
    shell:
        "lofreq indelqual --dindel -f {params.ref} -o {output} {input.bam}"

rule alnqual:
    input:
        bam = "05_map2covid.lofreq.indelqual.bam"
    output:
        "06_map2covid.lofreq.alqual.bam"
    params:
        ref = "NC_045512.2.fasta"
    shell:
        "lofreq alnqual -b {input.bam} {params.ref} > {output} "

rule lofreq_call:
    input:
        indelqual = "06_map2covid.lofreq.alqual.bam"
    output:
        "07_map2covid.lofreq.vcf"
    params:
        ref = "NC_045512.2.fasta"
    shell:
        "lofreq call --call-indels -f {params.ref} -o {output} 

It's quite strange since there are no problems when run one file with snakemake, but if there are several files to run, it report error of indel alignment-quality.

andreas-wilm commented 1 year ago

Hi @Xiang-Leo,

This look more like a problem with the workflow itself. The commands in those rules look fine. I can't see what is actually being used in rule lofreq_call though, because the code is clipped at the end (and the input naming looks a bit weird: in the rules above you used bam as variable).

Best, Andreas

Xiang-Leo commented 1 year ago

I'm sorry for the clipped command. Actually I run three commands:

lofreq indelqual --dindel -f NC_045512.2.fasta -o 05_map2covid.lofreq.indelqual.bam 03_map2covid.sort.bam    # 05_map2covid.lofreq.indelqual.bam means the generated quality file
lofreq alnqual -b 05_map2covid.lofreq.indelqual.bam NC_045512.2.fasta > 06_map2covid.lofreq.alqual.bam
lofreq call --call-indels -f NC_045512.2.fasta -o 07_map2covid.lofreq.vcf 06_map2covid.lofreq.alqual.bam

Usually, if there are 3 or less files, it will report no problems. While I run snakemake with more then 10 files, it will report without indel alignment-quality

Xiang-Leo commented 1 year ago

Also, when I run these commands step by step, it also report the warning WARNING(lofreq_call.c|main_call): 1 indel calls (before filtering) were made without indel alignment-quality! Did you forget to indel alignment-quality to your bam-file?. I'm sure that I ran lofreq indelqual and lofreq alnqual. Here is an example: ERR10695916

andreas-wilm commented 1 year ago

Hi @Xiang-Leo,

The series of commands you gave looks correct, i.e.:

lofreq indelqual --dindel -f NC_045512.2.fasta -o 05_map2covid.lofreq.indelqual.bam 03_map2covid.sort.bam
lofreq alnqual -b 05_map2covid.lofreq.indelqual.bam NC_045512.2.fasta > 06_map2covid.lofreq.alqual.bam
lofreq call --call-indels -f NC_045512.2.fasta -o 07_map2covid.lofreq.vcf 06_map2covid.lofreq.alqual.bam

I assume you removed the indexing commands for the sake of brevity.

The fact that the behaviour changes with more files, hints at a workflow problem. Can you see how the Snakemake commands change between one working input and one input that doesn't work?

Thanks, Andreas