alekseyzimin / masurca

GNU General Public License v3.0
245 stars 35 forks source link

POLCA fails to generate .vcf file #317

Open lerminin opened 1 year ago

lerminin commented 1 year ago

Hi there,

I'm running Masurca v4.0.9 and am having an issue using POLCA to polish a Flye assembly with Illumina reads where it prints the following error:

polca.sh -t 2 -a assembly_polished.fasta -r "R1.fq R2.fq"

conda_envs/polca/bin/bwa
conda_envs/polca/bin/freebayes
conda_envs/polca/bin/samtools
[Mon Mar 27 13:22:29 CDT 2023] Creating BWA index for assembly_polished.fasta
[Mon Mar 27 13:22:32 CDT 2023] Aligning reads to assembly_polished.fasta
[Mon Mar 27 13:23:10 CDT 2023] Sorting and indexing alignment file
[samopen] SAM header is present: 6 sequences.
[Mon Mar 27 13:23:28 CDT 2023] Calling variants in assembly_polished.fasta
Processing 4 scaffold(s) per batch
./commands.sh: line 3: ../assembly_polished.fasta.vcf: No such file or directory
./commands.sh: line 3: ../assembly_polished.fasta.vcf: No such file or directory
[Mon Mar 27 13:23:59 CDT 2023] Fixing errors failed on batch 1 in assembly_polished.fasta.fix
[Mon Mar 27 13:23:59 CDT 2023] Fixing consensus failed in ./assembly_polished.fasta.fix

From what I can tell, the .vcf file is not being generated and this causes the fix_consensus_from_vcf.pl script to crash? I've run this command in this workflow many times and have not had any issues except for this one assembly file. There's 6 contigs in this assembly file and I can run POLCA on each contig individually without issue. I'm not sure how to diagnose the problem from this error message; it's not clear to me if there is a problem with my assembly file or a bug in the program and would appreciate any help.

May be related to #308

tfwulff commented 1 year ago

Hi,

I have encountered a similar problem when running POLCA (MaSuRCA v4.1.0) on a Flye assembly in a second round of polishing. In my case, it turned out that the .vcf file was not generated because FreeBayes did not call any variants and outputted empty .vcf files for each batch. When trying to create the final .vcf file, the grep command in line 178 of polca.sh exited with exit status 1 and prevented it from being created: seq 1 $BATCHES | xargs -I % ls $BASM.vc/%.vcf |xargs cat | grep -v '^#' > $BASM.vcf.body.tmp && mv $BASM.vcf.body.tmp $BASM.vcf.body

There probably is a better to way to fix this, but as a workaround I have changed that line to: seq 1 $BATCHES | xargs -I % ls $BASM.vc/%.vcf |xargs cat | { grep -v '^#' || true; } > $BASM.vcf.body.tmp && mv $BASM.vcf.body.tmp $BASM.vcf.body

I guess there might be a similar problem in your case in that chained command for generating the final .vcf file. Did you check the presence of the temporary files, like assembly_polished.fasta.vcf.header.tmp? Might help to understand at which step it fails.

lerminin commented 1 year ago

Thanks for the reply, modifying that line as you suggested worked for me in v4.0.9

Li-Alvarez commented 8 months ago

I had the same issue, corrected line 178, and it works perfectly in v4.1.0.