bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

mosdepth index files missing? #3124

Closed gabeng closed 2 years ago

gabeng commented 4 years ago

It appears that mosdepth creates *.csi index files for all bgzipped files like per-base-coverage. However, I cannot find those index files in the bcbio output directory. I don't find a log message that these files are actually copied into the final directory. I am currently using bcbio 1.1.7, but I did not notice any code changes with respect to those qc files. How can I get those index files transferred to the final directory?

naumenko-sa commented 4 years ago

Hi Ben @gabeng!

Could you please clarify your setup? Are you running:

tools_on:
    coverage_perbase

Sergey

gabeng commented 4 years ago

Hi Sergey, yes, I am using coverage_perbase.

naumenko-sa commented 4 years ago

Hi Ben @gabeng !

I'm running:

details:
- algorithm:
    aligner: bwa
    effects: vep
    effects_transcripts: all
    ensemble:
      numpass: 2
      use_filtered: false
    mark_duplicates: true
    realign: false
    recalibrate: false
    save_diskspace: true
    tools_on:
    - gemini
    - svplots
    - qualimap
    - vep_splicesite_annotations
    - noalt_calling
    - coverage_perbase
    variantcaller:
    - gatk-haplotype
    - samtools
    - platypus
    - freebayes
    vcfanno:
    - /n/data1/cores/bcbio/naumenko/ashkenazim_trio/config/cre.vcfanno.conf
  analysis: variant2
  description: HG002_NA24385_son
  files:
  - /n/data1/cores/bcbio/naumenko/ashkenazim_trio/input/HG002_NA24385_son.chr22.bam
  genome_build: hg38
  metadata:
    batch: ashkenazi_fam
- algorithm:
    aligner: bwa
    effects: vep
    effects_transcripts: all
    ensemble:
      numpass: 2
      use_filtered: false
    mark_duplicates: true
    realign: false
    recalibrate: false
    save_diskspace: true
    tools_on:
    - gemini
    - svplots
    - qualimap
    - vep_splicesite_annotations
    - noalt_calling
    - coverage_perbase
    variantcaller:
    - gatk-haplotype
    - samtools
    - platypus
    - freebayes
    vcfanno:
    - /n/data1/cores/bcbio/naumenko/ashkenazim_trio/config/cre.vcfanno.conf
  analysis: variant2
  description: HG003_NA24149_father
  files:
  - /n/data1/cores/bcbio/naumenko/ashkenazim_trio/input/HG003_NA24149_father.chr22.bam
  genome_build: hg38
  metadata:
    batch: ashkenazi_fam
- algorithm:
    aligner: bwa
    effects: vep
    effects_transcripts: all
    ensemble:
      numpass: 2
      use_filtered: false
    mark_duplicates: true
    realign: false
    recalibrate: false
    save_diskspace: true
    tools_on:
    - gemini
    - svplots
    - qualimap
    - vep_splicesite_annotations
    - noalt_calling
    - coverage_perbase
    variantcaller:
    - gatk-haplotype
    - samtools
    - platypus
    - freebayes
    vcfanno:
    - /n/data1/cores/bcbio/naumenko/ashkenazim_trio/config/cre.vcfanno.conf
  analysis: variant2
  description: HG004_NA24143_mother
  files:
  - /n/data1/cores/bcbio/naumenko/ashkenazim_trio/input/HG004_NA24143_mother.chr22.bam
  genome_build: hg38
  metadata:
    batch: ashkenazi_fam
resources:
  default:
    cores: 7
    jvm_opts:
    - -Xms750m
    - -Xmx7000m
    memory: 7G
upload:
  dir: ../final

and work/coverage/HG002_NA24385_son has:

HG002_NA24385_son-variant_regions.mosdepth.region.dist.txt
HG002_NA24385_son-variant_regions.per-base.bed.gz
HG002_NA24385_son-variant_regions.quantized.bed.gz
HG002_NA24385_son-variant_regions.quantized-vrsubset.bed
HG002_NA24385_son-variant_regions.quantized-vrsubset-callableblocks.bed
HG002_NA24385_son-variant_regions.quantized-vrsubset-nblocks.bed
HG002_NA24385_son-variant_regions.regions.bed.gz
target-genome.bed

There are no csi indices. Can you please elaborate some more? I did not get what was the issue.

Sergey

gabeng commented 4 years ago

Hi Sergey,

Thanks for looking into this. The mosdepth manual states that there are always *.csi index files created for every *.gz file (see https://github.com/brentp/mosdepth#exome-example) In particular the .per-base.bed.gz files can be pretty large so indexing is mandatory for downstream processing. For some reason, it seems, the *.csi files are lost in bcbio. It would be very helpful if they would be retained and stored with the *.bed.gz files.

Ben