BD2KGenomics / toil-scripts

Toil workflows for common genomic pipelines
Apache License 2.0
32 stars 18 forks source link

Write new alignment pipeline #45

Closed jvivian closed 8 years ago

jvivian commented 8 years ago

Will be borrowing the methodology from Arjun's pipeline: https://github.com/arkal/docker_scripts/blob/master/precision_immuno.py

jvivian commented 8 years ago

It's been written, I just need to do a test run.

jvivian commented 8 years ago

I've been trying to figure this issue out for over a day now, but in the final reheader step of the program I'm getting a memory leak in docker, in that memory usage monotonically increases until the daemon crashes. I've upgraded samtools version to 1.2, and tried 3 different versions of docker all to no avail. This issue is reproducible outside of Toil by just calling the container. Arjun hasn't run into this problem despite using the same technique (I borrowed his methodology), so I'm not sure what the issue is, but it's really annoying.

Ultimately a (slower) workaround would be BAM->SAM->BAM conversion to adjust the header which i'd rather not do.

pic

sudo docker run --rm --log-driver=none -v $(pwd):/data quay.io/ucsc_cgl/samtools reheader output_bam.header aligned.bam > test.bam
Jeltje commented 8 years ago

How frustrating. I think reheadering has been a bit of an issue with samtools, but I don't know why. Do you not see the problem when you run Arjun's tool on the same inputs? Do you see it on all inputs? I remember having issues running Varscan on samtools output (earlier versions) with some bam files in a set but not others. Highly annoying (and in my case solved by using an earlier version of samtools).