CRG-CNAG / CalliNGS-NF

GATK RNA-Seq Variant Calling in Nextflow
Mozilla Public License 2.0
130 stars 53 forks source link

Process POST_PROCESS_VCF failes when order of chromosomes in result.DP8.vcf differs from that in the GRCm38/Annotation/Variation/Mus_musculus.vcf #21

Open dylanht opened 3 years ago

dylanht commented 3 years ago

To fix, I added a call to vcf-sort in the middle of the POST_PROCESS_VCF script - I tried installing and using bcftools, but it requires a header with the "contig" section which is not present in these intermediate files, and vcftools is already included in the container. Will submit PR for review.

Error from vcftools on process failure is:

Comparing sites in VCF files...
  Error: Cannot determine chromosomal ordering of files, both files must contain the same chromosomes to use the diff functions.
  Found 10 in file 1 and 1 in file 2.

Looking in the working directory associated with the failing task, POST_PROCESS_VCF produces the file result.DP8.vcf with chromosomes ordered as grep -v "#" result.DP8.vcf | cut -f 1 | uniq | tr "\n" " ":

grep -v "#" result.DP8.vcf | cut -f 1 | uniq | tr "\n" " "
# 10 11 12 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 MT X Y
singularity exec callings-nf_gatk4.sif vcf-sort result.DP8.vcf > result.DP8.vcf.sorted
# unix command printed on execution is "sort -k1,1d -k2,2n"
grep -v "#" result.DP8.vcf.sorted | cut -f 1 | uniq | tr "\n" " "
# 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 MT X Y
grep -v "#" filtered.recode.vcf | cut -f 1 | uniq | tr "\n" " "
# 1 10 11 12 13 14 15 16 17 18 19 2 3 4 5 6 7 8 9 MT X Y

Possible confounder here is that we are trying to use the Boyle lab's https://github.com/Boyle-Lab/Blacklist for mm10, but using Ensembl build of GRCm38 available from iGenomes - I wrote in a profile into the config as such:

singularity {
    singularity.enabled = true
    singularity.cacheDir = './singularity_cache'
    process {
        container = 'quay.io/nextflow/callings-nf:gatk4'
        executor = 'slurm'
        queue = 'our_queue'
        memory = 16.GB
        errorStrategy = 'finish'
        withLabel: mem_large { memory = 48.GB }
        withLabel: mem_xlarge { memory = 64.GB }
            params {
                genome  = "iGenomes/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/genome.fa"
                reads = "reads/*_{1,2}.fastq.gz"
                variants  = "iGenomes/Mus_musculus/Ensembl/GRCm38/Annotation/Variation/Mus_musculus.vcf"
                denylist  = "iGenomes/Blacklist/lists/mm10-blacklist.v2.bed"
                results    = "./results"
            }
    }