Closed Carol-Symbiomics closed 5 years ago
Hi there, I think it has to be with the particular states of those columns, if a column has Rs and As for example it is considered invariable because R is A or G. See a better explanation here in the IQ-TREE FAQs (the very last question): http://www.iqtree.org/doc/Frequently-Asked-Questions
IQ-TREE will filter out those invariable columns if you try to run analysis with the +ASC
model and your matrix still has this kind of invariable sites. It will automatically create a new matrix with extension .varsites.phy which can be analyzed with +ASC
either in IQ-TREE itself or RAxML.
Edgardo
Hi Edgardo,
Thanks for your quick reply. I'm a new user of RAxML, but was examining the converted vcf file and noticed some "weird" characters (e.g. K, W, R, N, S Y, M), is that normal? Shouldn't I have all my SNPs concatenated? My SNPs are biallelic, so will they be exchanged to an R if they are either A or G??
With your phyton code, is there a way to only concatenate the SNPs in the VCF, avoiding this iqtree nomenclature http://www.iqtree.org/doc/ ?
Thanks in advance for your help
All the genotypes in your VCF are transformed to their IUPAC ambiguity code because the output matrices have a single sequence per sample. It is normal to have the ambiguity codes in the matrices, these ambiguity codes are routinely analyzed in phylogenetics, but they can violate the conditions of the ascertainment model.
Or, I guess I don't understand very well your question: "Shouldn't I have all my SNPs concatenated?", would you mind to clarify? What kind of output were you expecting?
Edgardo
You understood correctly. I thought there was a way to concatenate the SNPs into a fasta file avoiding the use of the IUPAC ambiguity code. It looks to me that I will have to use the IQ-tree to filter out the "non-variant" sites, cause RAxML doesn't have that option. Thanks for your time!
No problem,
Also, you have some alternatives for your analysis (since you are interested in the clustering and not so much in the branch lengths where the ascertainemnt correction becomes more relevant), you could use your SNPs matrix (in NEXUS format) with svdquartets
which is now part of PAUP, or simply analyze them in IQ-TREE (or RAxML) without the ascertainment correction.
Edgardo
Hi! I'm interested in using RAxML to assess how my samples cluster based on RADseq population SNPs. I converted my VCF file to phylip format using your phyton code. I'm completely sure I don't have monomorphic SNPs in my data set. However when I run RAxML with the ASC correction option (recommended when using only variable sites=SNPs) the program displays and error: "For partition No Name Provided you specified that the likelihood score shall be corrected for invariant sites via an ascertainment bias correction. However, some sites in this partition are already invariant. This is not allowed, please remove all invariant sites and try again, exiting". How is this possible. Can someone help me?