Closed bshrestha0 closed 6 months ago
Hi @bshrestha0,
The tree is rooted at midpoint before running the algorithm. Check the same tree in the nexus folder to see the rooted tree.
I see, the HGT call based on the rooted tree makes sense. I was looking at the wrong tree file. Thanks for pointing it out!
Looks like that blastp resulted with multiple eukaryotic hits including Dinophyceae, Streptophyta, Chlorophyta, and Haptophyta (mulitple hits) but during the AvP prepare step only Dinophyceae members and a Steptophyta was selected. For example here's the partial output of diamond blastp-
and the output of calculate_ai.py-
Both Haptophyta and Chlorophyta were excluded in the processed fasta file, perhaps they failed to meet the criteria or removed during clustering. I was wondering if there's a way to include these sequences in the fasta group, maybe with changing some parameters. I am using the default parameters in config file-
Thanks!
You should check the cutoffextend parameter. This will keep until n=20 hits from the blast output following the first ingroup hit. In your example it will keep until the next 20 hits after YP_009033831.1
I have 16 eukaryote hits out of total 302 hits, and rest all are bacterial hits. With n=20 shouldn't all 16 eukaryotes include in the fasta file? Or when you said next 20 hits, do you mean 20 subsequent hits right after the first eukaryotic hit regardless of ingroup or outgroup?
Sorry for the confusion, the algorithm will keep
20 subsequent hits right after the first ingroup hit regardless of ingroup or outgroup
No worries, I will tweak parameter cutoffextend then and see how it goes.
Thank you for helping me out!
Hi,
I am interested in identifying HGT events in Dinophyceae, specifically non-eukaryotic gene transfers in Dinophyceae.
So, I set up groups.yaml as-
Ingroup: 2759: Eukaryota EGP: 2864: Dinophyceae
While looking at the putative HGTs identified by AvP, I wasn't sure how AvP tagged a protein as HGT. I have included a screehshot of the tree generated by AvP.![NR gp4504 fa](https://github.com/GDKO/AvP/assets/35469867/2ac3c7ee-0504-486e-bf41-fdf2b77eeafd)
As you can see in the tree , query protein sequence "50_DN47782_c0_g1_i1" is in a clade with other Dinophyceae members and the sister taxon to that clade is another eukaryote (YP_009033831), which is a streptophyte algae. All other members are bacterial species. So, since a homolog of that gene is already present in other eukaryote, how can it be a HGT specific to Dinophyceae. I am little confused here.
Best regards, Bikash