Closed matthdsm closed 4 years ago
for reference, scramble options
-bash-4.2$ scramble -h
-=- sCRAMble -=- version 1.14.11
Author: James Bonfield, Wellcome Trust Sanger Institute. 2013-2018
Usage: scramble [options] [input_file [output_file]]
Options:
-I format Set input format: "bam", "sam" or "cram".
-O format Set output format: "bam", "sam" or "cram".
-1 to -9 Set compression level.
-0 or -u No compression.
-H [SAM] Do not print header
-R range [Cram] Specifies the refseq:start-end range
-r ref.fa [Cram] Specifies the reference file.
-b integer [Cram] Max. bases per slice, default 5000000.
-s integer [Cram] Sequences per slice, default 10000.
-S integer [Cram] Slices per container, default 1.
-V version [Cram] Specify the file format version to write (eg 1.1, 2.0)
-e [Cram] Embed reference sequence.
-x [Cram] Non-reference based encoding.
-M [Cram] Use multiple references per slice.
-m [Cram] Generate MD and NM tags.
-Z [Cram] Also compress using lzma.
-f [Cram] Also compression using fqzcomp (V3.1+)
-n [Cram] Discard read names where possible.
-P Preserve all aux tags (incl RG,NM,MD)
-p Preserve aux tag sizes ('i', 's', 'c')
-q Don't add scramble @PG header line
-N integer Stop decoding after 'integer' sequences
-t N Use N threads (availability varies by format)
-B Enable Illumina 8 quality-binning system (lossy)
-! Disable all checking of checksums
-g FILE Convert to Bam using index (file.gzi)
-G FILE Output Bam index when bam input(file.gzi)
+1 on scramble.
Thanks so much, sorry for not responding. Yes we're definitely up for it. I'll test your p/r locally.
No problem, let me know if you need anything else. Local tests check out nicely.
Cheers M
Thanks @roryk and @matthdsm! I see it has been released in bcbio 1.2.2! https://github.com/bcbio/bcbio-nextgen/commit/037fa09812556698233e477fe9e96cecfee21d37 Closing to celebrate this achievement!
Hi,
Would you guys be open to reviewing the bam to cram conversion in bcbio? Currently, this is done using either samtools or bam-squeeze. I propose replacing both with scramble, which should be faster and uses state of the art compression. The tool can be installed using conda and represents a minimal dependency.
Let me know what you think and if I should put some work towards this.
Cheers M