Closed alexvasilikop closed 2 years ago
Hi Alexandros,
Thanks for you interest in using pSONIC! Given the phylogenetic sampling you described, pSONIC probably wouldn't identify more single copy genes than other methods because of the WGD within your samples creates the null expectation that those lineages that have experienced a WGD should have 2 copies of every gene (notwithstanding gene loss patterns post-WGD). However, if you want to reduce the number of paralogs (i.e. optimize the number of single copy orthologs) in only the samples that have experienced the tetraploidy, then I would suggested running pSONIC without any of the earliest diverging pre-WGD lineages and using the default settings of the --ploidy flag so that all of your samples have the same ancestral ploidy level.
As for your question about genome completeness, pSONIC has a minimum requirement of 5 syntenic genes on each scaffold/chromosome, so if a large majority of your genome contains scaffolds with more than 5 genes, I would suggest running pSONIC and seeing how well it does. If you have a really non-contiguous genome, however, then I would recommend sticking to sequence similarity based approaches like OrthoFinder (and you can still try the reduced sampling scheme above to see if that reduces the number of paralogs in your dataset).
I'm happy to discuss this more, so please let me know if you have any other questions or if I need to clarify anything further!
Justin
Dear Justin,
Thanks a lot for your response. My aim is to infer the phylogeny of the entire group (including the paleotetraploid species and diploid species) but I noticed orthofinder returns a few single-copy orthologs when all species are included in the analysis because many orthogroups contain homoeolog duplications. Because the genomes are not assembled to the chromosome level, it is not possible to split into subgenomes and exclude one of the two subgenomes for increasing the number of single-copy orthologs for phylogenetic inference. This is why I was wondering if pSONIC would help but if you say it cannot I will have to find another solution.
Many thanks
Hi Alexandros,
Yes, if you don't have high contiguity then unfortunately pSONIC wouldn't be an appropriate tool. However, since you're interested in inferring phylogenies, you may be able to prune the OrthoFinder gene trees if the paleotetraploid homoeologs come out sister to each other, or use methods that can use multiply-labeled gene trees such as GRAMPA (https://academic.oup.com/sysbio/article/66/6/1007/3610602). These can incorporate orthogroups that aren't strictly 1:1 which is probably the most robust approach for a question like yours and is explicitly designed to differentiate allo- from autopolyploid events.
Justin
Hello,
I was wondering if it makes sense to use pSONIc for inferring single-copy orthologs from non-contiguous genome assemblies or if you have tried this in the past.
I have a dataset with an ancient hybridization event that led to ancient tetraploidy for a few genomes but the dataset also includes the earliest diverging lineages (before hybridization) which should be diploid. As a result the number of single-copy orthologs inferred by Orthofinder (when using the haploid genomes) is really small because there is a large number of orthogroups with a duplication at the common ancestor of paleotetraploid species.
The problem are that most of the genome assemblies are not chromosomal and fragmented (based on Illumina), therefore the synteny information should be limited. Nevertheless pSONIC should still be able to infer more single-copy orthologs for the small blocks inferred with MCSCanX. What do you think?
Thanks