genome / bam-readcount

Count bases in BAM/CRAM files
MIT License
298 stars 95 forks source link

how to get the site list for indels? #92

Open breezetown opened 2 years ago

breezetown commented 2 years ago

Hi,we want to run bam-readcount to obtain a file of readcounts for our indels. But we don't know how to define the site list for indels, or how to get the site list from the vcf file from the same sample.

chrisamiller commented 2 years ago

Indels will be reported at the appropriate base as an additional column (for example A:xxxx C:xxxx G:xxxx T:xxxx +A:xxxx). it's straightforward to add these. If you want to add readcounts to your VCF directly, consider looking at the scripts in VAtools as outlined here: https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html

Hope this helps!

breezetown commented 2 years ago

Thanks, but we can not solve this, if we have a vcf file like this: CHROM POS REF ALT
chr1 1168012 CCTG C
chr1 1356341 TTCC T
chr1 1534913 CGCG C
chr1 1684347 CCCT C
chr1 1684347 CCCT CCCTCCT how to get the site list from this vcf file? Because some indels are compliacted.

chrisamiller commented 2 years ago

Complex indels (like the last one) are not currently well-supported by bam-readcount. The others should be fine, though. if in doubt, pick one, run bam readcount on a small interval around the event and look for the indel to match up as as sanity check.