WGLab / NanoCaller

Variant calling tool for long-read sequencing data
MIT License
90 stars 8 forks source link

ls: cannot access '/output/intermediate_files/*/*snps.vcf.gz': No such file or directory #12

Closed KewinOgink closed 3 years ago

KewinOgink commented 3 years ago

Hi

I am trying out Nanocaller via docker for WGS.

I tried running via the command below, but it is done after seconds without an obvious error message, other than

2021-03-04 16:50:02.678299: Starting NanoCaller.

Running arguments are saved in the following file: /output/args

2021-03-04 16:50:02.679210: Commands for running NanoCaller on contigs in whole genome are saved in the file /output/wg_commands
Running 17 jobs using 10 workers in parallel.

ls: cannot access '/output/intermediate_files/*/*snps.vcf.gz': No such file or directory
Could not read the file: -
Writing to /tmp/bcftools-sort.ZfQoeq
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
Could not read VCF/BCF headers from -
Cleaning

ls: cannot access '/output/intermediate_files/*/*snps.phased.vcf.gz': No such file or directory
Writing to /tmp/bcftools-sort.o8MIcM
Could not read the file: -
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
Could not read VCF/BCF headers from -
Cleaning

The output I get are empty vcf and a intermediate_files directory with empty directories for each chromosome.

command:

docker run -it -v `pwd`/input:/input/ -v `pwd`/output:/output/ genomicslab/nanocaller:0.3.2 python NanoCaller_WGS.py -bam /input/nanopore.sorted.bam -ref /input/ref.fna -o /output/ -cpu 12 -seq ont -mode both -model NanoCaller1 -wgs_contigs_type all -mincov 10 -min_allele_freq 0.7  -cpu 10 

Any thoughts? My input is in the dir input

umahsn commented 3 years ago

HI,

Thank you for bringing this to our attention.

Can you check the file: /output/wg_commands where we print the commands to run in parallel, selection one command and run it using docker directly? So it would be like

docker run -it -vpwd/input:/input/ -vpwd/output:/output/ genomicslab/nanocaller:0.3.2 {python command from /output/wg_commands}

This would print the error log from the individual command run inside whole genome variant calling.

In the meanwhile, I am changing the script to automatically log errors from subprocess so that we can have better error information in the future.

KewinOgink commented 3 years ago

Hi umahsn,

Thank you for your quick reply. I was able to see the error now and it was due to a faulty read group header. I changed it and it works now, thank you! On another note: is it possible to train models with yeast and bacteria instead of human data?

umahsn commented 3 years ago

Hi,

We haven't trained a model on yeast or bacteria, because the SNP calling model is designed to take advantage of haplotype information, but this is something we will consider in the future. However, you can run NanoCaller on haploid genomes, and perhaps filter out heterozygous calls.

umahsn commented 3 years ago

I also wanted to let you know that the newer docker image for nanocaller 0.3.3 has added support for more comprehensive error logging.

KewinOgink commented 3 years ago

Thanks! Nice to see the improvements