cancerit / PCAP-core

NGS reference implementations and helper code for mapping (originally part of ICGC-TCGA-PanCancer)
GNU General Public License v2.0
9 stars 10 forks source link

bwa_mem.pl -p stats not doing anything on new versions #45

Closed jsmedmar closed 5 years ago

jsmedmar commented 5 years ago

This command fails to produce the .bas file in latest releases (have tested 4.2.3 and 4.3.5):

bwa_mem.pl -p stats -outdir . -reference ./reference.fasta -sample ind ./ind.bam

Same command worked on 1.14.0. Maybe I'm missing something? Problem is that no errors are raised, nothing is printed... I've tested with this container: https://hub.docker.com/r/leukgen/docker-pcapcore/tags. I've also tested on real data and get same behavior.

Test data attached: data.zip

keiranmraine commented 5 years ago

What does including -i 1 do?

bwa_mem.pl -p stats -i 1 -outdir . -reference ./reference.fasta -sample ind ./ind.bam
keiranmraine commented 5 years ago

I've remembered why this will be, we did some optimisation for BAM file outputs. If you are generating a BAM file the bas file is generated in a streaming method during the mark step and the stats step only serves to cleanup. If you are generating a CRAM file stats will actually create the BAS file.

jsmedmar commented 5 years ago

so there is no way to create the bas on a previously generated BAM? Basically I want to merge bwa_mem.pl generated BAMs without rerunning alignment... Yet still have the .met (which I know how to generate) and the .bas

keiranmraine commented 5 years ago

Instead of running bwa_mem.pl run bam_stats. It's in the same installation path and part of the docker/singularity image:

$ bam_stats -h
Usage: bam_stats -i file -o file [-p plots] [-r reference.fa.fai] [-h] [-v]

-i --input          File path to read in.
-o --output         File path to output.

Optional:
-r --ref-file       File path to reference index (.fai) file.
                    NB. If cram format is supplied via -b and the reference listed in the cram header can't be found bam_stats may fail to work correctly.
-a --rna            Uses the RNA method of calculating insert size (ignores anything outside ± ('sd'*standard_dev) of the mean in calculating a new mean)
-@ --num_threads    Use thread pool with specified number of threads.

Other:
-h --help           Display this usage information.
-v --version        Prints the version number.
jsmedmar commented 5 years ago

Perfect. Thanks Keiran!

keiranmraine commented 5 years ago

At some point we need to add a helper script for merge[+dup]+bas from multiple lanes mapped in isolation, so that it behaves like mapping multiple inputs in bwa_mem.pl.

jsmedmar commented 5 years ago

this actually what im writing right now - i think its a relevant use case

jsmedmar commented 5 years ago

sorry reopened by accident