blachlylab / fade

Fragmentase Artifact Detection and Elimination
MIT License
11 stars 3 forks source link

How to process mutect2 caller after fade out #16

Closed deb0612 closed 3 years ago

deb0612 commented 3 years ago

Dear sir, I tried to use docker to process fade on my bam file, and the filtered bam cannot indexed by samtools so that I can't call somatic variants by mutect2. command I used: sudo docker run -v pwd:/data blachlylab/fade out -b -c /data/input.hg38.fade.bam >out.hg38.fade.filtered.bam samtools index out.hg38.fade.filtered.bam [E::hts_idx_push] Unsorted positions on sequence #1: 203008683 followed by 203008623 [E::sam_index] Read 'A00355:191:HJ2TWDSXY:2:1101:1009:14982' with ref_name='chr1', ref_length=248956422, flags=163, pos=2030d

Even I use samtools sort. The same error keep coming.

charlesgregory commented 3 years ago

You may need to sort your bam again after running fade out. However if using samtools sort doesn't work, you may have something else wrong with your bam file. I need to update our documentation. If using fade out with the -c flag, you should run samtools sort before fade out and after.

jblachly commented 3 years ago

@deb0612 Please let us know if this has fixed your problem -- currently, annotate and out with -c flag commands may "un-sort" your sorted input for reasons of parallelism/speed. We will add warnings in the future.

It is not clear from your comment if you mean using sort AFTER the FADE commands also leads to the same indexing error?

deb0612 commented 3 years ago

Dear jblachly, Unfortunately, the problem is still unsolved. Whatever I sort my bam file before or after fade out.

charlesgregory commented 3 years ago

@deb0612 We would certainly like to help solve this issue. Could you provide a minimal reproducible example? For example, could you provide SAM line that causes the error when using samtools index?

deb0612 commented 3 years ago

Dear @charlesgregory, Thanks for your reply. Here is the bam file for tesing command I used: $sudo docker run -v pwd:/data blachlylab/fade annotate -b /data/test.bam /data/Homo_sapiens_assembly38.fasta > test.anno.bam $samtools sort -n test.anno.bam >test.anno.sort.bam $sudo docker run -v pwd:/data blachlylab/fade out -b -c /data/test.anno.sort.bam > test.anno.sort.filtered.bam $samtools sort -n test.anno.sort.filtered.bam >test.anno.sort.filtered.sort.bam $samtools index test.anno.sort.filtered.sort.bam [E::hts_idx_push] Unsorted positions on sequence #17: 63607429 followed by 63607391 samtools index: failed to create index for "test.anno.sort.filtered.sort.bam"

charlesgregory commented 3 years ago

Try this:

sudo docker run -v pwd:/data blachlylab/fade annotate -b /data/test.bam /data/Homo_sapiens_assembly38.fasta > test.anno.bam
samtools sort -n test.anno.bam >test.anno.sort.bam
sudo docker run -v pwd:/data blachlylab/fade out -b -c /data/test.anno.sort.bam > test.anno.sort.filtered.bam
samtools sort test.anno.sort.filtered.bam >test.anno.sort.filtered.sort.bam
samtools index test.anno.sort.filtered.sort.bam

In order to index the bam you need to do a coordinate sort. samtools sort -n is a queryname sort that is useful for fade out. In order to index the output of fade out you need to re-sort with samtools sort without the -n flag.

deb0612 commented 3 years ago

Thanks! It works.