Closed rmccoy7541 closed 5 years ago
The sequence is on the EBI FTP: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/hgsv_sv_discovery/working/20181025_EEE_SV-Pop_1/VariantCalls_EEE_SV-Pop_1/EEE_SV-Pop_1.ALL.sites.20181204.bed.gz
That BED file contains one record per SV, and the SEQ column is the SV sequence (inserted, deleted, or inverted bases).
This SV is an AluY, so the inserted sequence will align to any number of other regions within these contigs and the human reference.
Hello, I am wondering whether you have any advice about how to extract the sequence of the alternative allele (rather than symbolic ) using the information from VCFs output by smrtsv2. For example, taking the first VCF entry of your recent long-read sequencing study,
chr1 59599 NA19434_chr1-59599-INS-308 A <INS> 5 . SVTYPE=INS;SVLEN=308;END=59599;MERGE_SOURCE=NA19434;MERGE_SAMPLES=NA19434;MERGE_AC=1;MERGE_AF=0.07;MERGE_VARIANTS=NA19434_chr1-59599-INS-308;MERGE_VARIANTS_RO=1.00;CONTIG_SUPPORT=3;CONTIG_DEPTH=7;CONTIG=NA19434_chr1-20000-80000-ctg7180000000004;CONTIG_START=3817;CONTIG_END=4125;REPEAT_TYPE=AluY_simple;BKPTID=NA19434_chr1-59599-INS-308;PUBLISHED_ID=NA19434_chr1-59599-INS-308
I tried extracting the inserted sequence using
samtools faidx GCA_003709735.1_NA19434_EEE_SV-Pop.1_genomic.fna QVRF01000001.1:3817-4125
However, aligning this with sequence flanking the insertion seems to suggest that the insertion point doesn't line up perfectly.
Ultimately, I am hoping to be able to format the inserted sequence (for all insertions) as an alternative allele to be used as input to programs like BayesTyper. Thanks for your help!