JetBrains-Research / washu

Reproducible and scalable technical pipelines for ChIP-Seq and RNA-Seq processing
https://artyomovlab.wustl.edu/aging
MIT License
0 stars 0 forks source link

BED.GZ -> BAM not sorted #48

Closed olegs closed 6 years ago

olegs commented 6 years ago

Original problem: https://github.com/JetBrains-Research/epigenome/issues/1118

After some experiments with existing BED.GZ -> BAM conversion:

user@rosalind:/mnt/stripe/bio/experiments/test$ ls GSM646314_GM12878_CTCF_rep1.hg19.bam GSM646314_GM12878_CTCF_rep1.hg19.bed.gz user@rosalind:/mnt/stripe/bio/experiments/test$ bash -c 'export WASHU_ROOT=/mnt/stripe/washu; bash /mnt/stripe/washu/scripts/reads2bw.sh GSM646314_GM12878_CTCF_rep1.hg19.bam /mnt/stripe/bio/genomes/hg19/hg19.chrom.sizes foo.bw' Local tasks WASHU_PARALLELISM=8 bam2bw GSM646314_GM12878_CTCF_rep1.hg19.bam /mnt/stripe/bio/genomes/hg19/hg19.chrom.sizes foo.bw reads2bam GSM646314_GM12878_CTCF_rep1.hg19.bam /mnt/stripe/bio/genomes/hg19/hg19.chrom.sizes bam: GSM646314_GM12878_CTCF_rep1.hg19.bam Input error: Chromosome chr11_gl000202_random found in non-sequential lines. This suggests that the input file is not sorted correctly. ... Input error: Chromosome chr1 found in non-sequential lines. This suggests that the input file is not sorted correctly. Local tasks WASHU_PARALLELISM=8 Overlapping regions in bedGraph line 122 of foo.bdg.sort.clip Which leads to incorrect BW file.