WGLab / NanoCaller

Variant calling tool for long-read sequencing data
MIT License
90 stars 8 forks source link

Supporting reads #40

Closed jamesdalg closed 8 months ago

jamesdalg commented 8 months ago

Is there a way to find which supporting reads contain the SNP in question using nanocaller? Perhaps there are some temp files that are created in the process of SNP calling that help to determine this.

umahsn commented 8 months ago

Hi, currently we discard the read names after feature generation, but it is certainly possible to get this information during the runtime. Another option is to write a small utility function that is run afterwards to get the read names for SNPs that you are interested in.

umahsn commented 8 months ago

Hi, I have added a script: https://github.com/WGLab/NanoCaller/blob/master/misc/get_SNP_readnames.py

You can run it as python get_SNP_readnames.py --vcf variants.vcf.gz --bam alignments.bam --output read_names.

The format of the output is: chromosome position allele1:read_name1,read_name2 allele2:read_name3,read_name4 allele3:read_name5

Where allele 1 is reference allele, followed by al the alternative alleles.

jamesdalg commented 8 months ago

wow! Thanks! I'll give it a try.