Snitkin-Lab-Umich / scripts

Frequently used Scripts
2 stars 1 forks source link

Create a new function to split lines with duplicate alleles #10

Open sthiede opened 5 years ago

sthiede commented 5 years ago

Currently this function splits lines where the same variant has multiple annotations (e.g. overlapping genes) This function deals with lines such as Coding SNP ... > A,G Create a new function to: 1) split lines (A gets one line, G gets another) 2) Modify the rowname so that it is >A and >G, not >A,G 3) Change the content of the rows in the CODE matrix to correspond to the presence/absence of the allele for that line https://github.com/Snitkin-Lab-Umich/scripts/blob/0e50918cda987e34ea962951885154c6947b42b5/variant_parser_functions.R#L19

katiesaund commented 5 years ago

@sthiede Let's prioritize this bug when you're back in town! With the inclusion of the outgroups when creating the snpmat there are even more multi-allelic positions than before (~6% in my Hanna dataset now).