bjpop / rover

Read overlap variant caller.
Other
7 stars 1 forks source link

Problems with bam file #8

Open marcelcaraciolo opened 9 years ago

marcelcaraciolo commented 9 years ago

Hello guys,

First of all, good work! I am trying to use the tool for one of our bam files and it is giving us an IndexError during the cigar reading step at your code. The following traceback:

./rover/rover/rover.py --primers KRASNRAS.bed  --out variants_rover.vcf --proportionthresh 0.05 --overlap 0.9 --absthresh 2 --id_info dbsnp_138.hg19.vcf ./Miseq-03-04-15/KRASNRAS/164180_4/alignments/164180_4_001_L001_1.realigned.recal.bam
WARNING:root:Read M00538:83:000000000-D0CJD:1:1101:8832:16146 with no pair
WARNING:root:Read M00538:83:000000000-D0CJD:1:1101:15624:2194 with no pair
WARNING:root:Read M00538:83:000000000-D0CJD:1:1101:14841:2252 with no pair
WARNING:root:Read M00538:83:000000000-D0CJD:1:1102:27778:18791 with no pair
Traceback (most recent call last):
  File "./rover/rover/rover.py", line 913, in <module>
    main()
  File "./rover/rover/rover.py", line 910, in main
    process_bams(args)
  File "./rover/rover/rover.py", line 892, in process_bams
    total_data, vcf_reader)
  File "./rover/rover/rover.py", line 630, in process_blocks
    parse_md(get_md(read2), []))
  File "./rover/rover/rover.py", line 163, in read_variants
    context = aligned_bases[seq_index + next_md.size - 1].base
IndexError: list index out of range
marcelcaraciolo commented 9 years ago

If you need the bam file and the bed file let me know! I can put here the links to download.

bjpop commented 9 years ago

Hi @marcelcaraciolo. Thanks for the bug report. I would like to see the BAM file if possible. So it would be great if you can post a link.

marcelcaraciolo commented 9 years ago

Hello @bjpop here is the link for the BAM file and BAI as attached:

https://dl.dropboxusercontent.com/u/1977573/BAMData.zip

The primers file:

https://dl.dropboxusercontent.com/u/1977573/KRASNRASPRIMERS.bed

bjpop commented 9 years ago

Thanks for posting the files.

I think I have addressed the problem. The fix is in the branch called soft_clip_bugfix.

Can you please try pulling that branch and testing it?

Let me know if you have any difficulties.

marcelcaraciolo commented 9 years ago

Thanks @bjpop I will pull and test! I will give you the feedback! :)

marcelcaraciolo commented 9 years ago

@bjpop It works ! But I still don't get any variants, even when I soft some parameters, for instance the overlap (0.9, 0.8, or 0.7). It gives me some warnings related to the missing pair reads, but I believe it is only a proportion.

WARNING:root:Read M00538:83:000000000-D0CJD:1:1101:11144:14641 with no pair WARNING:root:Read M00538:83:000000000-D0CJD:1:1101:12217:3051 with no pair WARNING:root:Read M00538:83:000000000-D0CJD:1:1101:28905:17909 with no pair WARNING:root:Read M00538:83:000000000-D0CJD:1:1102:20864:14548 with no pair

Could you give me some lights that could be issued on ?

bjpop commented 9 years ago

Hi @marcelcaraciolo. I'm not sure why you get no variants (unless the sample doesn't contain any).

Was the bam file you linked previously the whole data set? If so, can you tell me coordinates where you expect to see variants (chrom start-end)? Obviously don't post those details here if they are secret or sensitive to your work.

marcelcaraciolo commented 9 years ago

Hi @bjpop

Using another software for calling the variants we have found some variants, for instance one variant that could be potential to be identified at your pipeline would be a variant at the region chr12:25398282 C>T (KRAS).

The region would be: chr12:25398252-25398315

You can use the same BAM FILE previously posted here.

marcelcaraciolo commented 9 years ago

Another potential issue that we found is that when we change the order of the regions in the BED file. For example , if we change the order of the regions in the bed file, the output comes with nothing.

To simulate what I explained:

CASE with no outputs https://dl.dropboxusercontent.com/u/1977573/RoverAnalysis.zip

CASE with outputs (since it is the same bam file I only placed here the bed file) https://dl.dropboxusercontent.com/u/1977573/PRIMERS.bed

The difference as you can see it's only the order of the regions at the bed file.

marcelcaraciolo commented 9 years ago

Hi @bjpop , any news about those issues related ? I will also try to inspect the code to see if I can try fix it, also.

Thanks!

bjpop commented 9 years ago

Sorry @marcelcaraciolo I have been a bit busy. Thank you for the test cases. I will try to find time to look into them this week.

apastore commented 9 years ago

I have the same problem the fix just suppress the error but rover does not find any mutations.

could be that the issues is GATK related?

after installing the fix I get this message in my log:

06/25/2015 16:51:38 MD del in cigar match [21, ^T, 1, A, 8, T, 2, G, 1, A, 61] [(4, 8), (0, 15), (1, 1), (0, 6), (2, 1), (0, 77)]

thanks!

bjpop commented 9 years ago

Hi @apastore and @marcelcaraciolo,

Sorry for the delay in my response. I have been very busy.

Can you please explain to me exactly how you are processing your files up to and including the use of rover?

Also, I want to make sure you are following the Hi-Plex protocol. Rover assumes that you have completely overlapping reads, and makes some assumptions about how adapter/primer sequences are trimmed from the reads.