JLSteenwyk / orthosnap

a tree splitting and pruning algorithm for retrieving single-copy orthologs from gene family trees
https://jlsteenwyk.com/orthosnap/
MIT License
21 stars 1 forks source link

Orthosnap output #9

Closed evelinepinseel closed 7 months ago

evelinepinseel commented 7 months ago

Hello,

OrthoSNAP is a great tool - thanks for developing it!

I have a general question. Is there a way for orthoSNAP to output the removed in-paralogs and their corresponding orthologs? For my downstream ortholog-level analyses, I am looking for a tool that can identify orthologs, but also informs me on which in-paralogs correspond with these orthologs.

If this is not possible, can orthoSNAP output singleton orthologs (= a 'lonely' sequence in the gene tree)? I noticed in orthoSNAP's test example that these singletons are not assigned to their own ortholog. However, if it would be possible to do this, I could come up with a workaround for my first issue.

Thanks already!

Best wishes, Eveline

JLSteenwyk commented 7 months ago

Hi Eveline,

Firstly, thank you for choosing to use OrthoSNAP! Your usage and feedback are greatly appreciated!

Regarding both of your questions, I believe I understand them correctly; however, I want to make sure I am. Would it be possible to provide a figure or draw on top of the one from the test example to clarify your question?

I apologize for any inconvenience this may cause you, but I value your question and want to make sure I appropriately address it.

Thanks again for using OrthoSNAP!

Best,

Jacob

evelinepinseel commented 7 months ago

Hi Jacob,

Many thanks for the quick reply and interest in my question - absolutely no inconvenience from my part!

Let me explain it using the example tree from the orthoSNAP paper.

Question 1: orthoSNAP will output one sequence per species in each ortholog. This means that in the test example, only one of the sequences with the label 'copy' are retained. In my case, I would need to figure out which of these removed 'copy' sequences correspond with which ortholog. Given that I am dealing with 9000+ gene trees, I would like to do this automatically. Is it possible for orthoSNAP to output which in-paralogs were removed? For example, say that the test example retains species2|gene2-copy0 and species4|gene2-copy1 in ortholog1, would it be possible for orthoSNAP to inform me that species2|gene2-copy1 and species4|gene2-copy0 were pruned from ortholog1?

Question 2 (if there is a solution to question 1, this question becomes obsolete - it's more of a workaround): I would like to run orthoSNAP with an occupancy threshold of 1 taxon, instead of the default 50%. I tried doing this on the test example and noticed that orthoSNAP outputs all orthologs, except for the singleton (species0|gene3 in the test example). Is there a way for orthoSNAP to output this singleton also as its own ortholog?

orthosnap

Thanks again!

JLSteenwyk commented 7 months ago

Hi Eveline,

Just wanted to let you know that I am working on this and will get back to you.

evelinepinseel commented 7 months ago

Thank you!

Op do 14 dec 2023 om 14:43 schreef Jacob L. Steenwyk < @.***>:

Hi Eveline,

Just wanted to let you know that I am working on this and will get back to you.

— Reply to this email directly, view it on GitHub https://github.com/JLSteenwyk/orthosnap/issues/9#issuecomment-1856559768, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3FUNIK52IR6OPJ6UGCQR3YJNQG3AVCNFSM6AAAAABAUD46WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJWGU2TSNZWHA . You are receiving this because you authored the thread.Message ID: @.***>

-- Dr Eveline Pinseel Simons Foundation Postdoc Fellow in Marine Microbial Ecology Department of Biological Sciences University of Arkansas 850 W Dickson St, Fayetteville AR, 72701, USA https://evelinepinseel.weebly.com

JLSteenwyk commented 7 months ago

Hi Eveline,

I hope you are doing well.

Please see the new -rih argument, which stands for report inparalog handling. This argument addresses your request by outputting a three column file. The first column is the relevant SNAP-OG file, the second column is the inparalog that was kept, and the third column is/are the inparalog/s that were removed. See here for docs: https://jlsteenwyk.com/orthosnap/usage/index.html#report-inparalog-handling

Thank you for choosing to use OrthoSNAP. You may find some of the other software I have developed (see here: https://jlsteenwyk.com/software.html) helpful for your studies.

Happy coding and happy holidays!

Best,

Jacob

P.S., Congrats on your contributions to the field of diatom evolution. I thought Pinseel (2022) ISME was particularly cool and underscores the importance of considering strain heterogeneity in 'omic studies.

evelinepinseel commented 7 months ago

Hi Jacob,

Amazing - thank you!

I checked the performance of the new option on the test example, and believe there might be a glitch. It looks like orthoSNAP assigned all the in-paralogs to the same SNAP-OG in the report file ( fake_orthologous_group_of_genes.faa.orthosnap.1), even though one of them (the last line in the screenshot) is in reality assigned to SNAP-OG fake_orthologous_group_of_genes.faa.orthosnap.0. Is this error reproducible for you?

I used : orthosnap -f "fake_orthologous_group_of_genes.faa" -t "fake_orthologous_group_of_genes_tree.tre" -s 80 -o 1 -rih.

[image: Screen Shot 2023-12-19 at 10.33.39 AM.png]

In addition, I noticed that orthoSNAP uses monophyletic pruning only. Are you planning on introducing a paraphyletic pruning option in the future (similar to the Yang & Smith pipeline)? I think this would be particularly useful for researchers working on transcriptome assemblies, which will inherently be more 'messy'. Paraphyletic pruning might prevent over-splitting of orthologs in such cases.

Thanks again - I love your software. Especially ClipKIT has already been very useful to me.

Best wishes, Eveline

Op ma 18 dec 2023 om 16:12 schreef Jacob L. Steenwyk < @.***>:

Closed #9 https://github.com/JLSteenwyk/orthosnap/issues/9 as completed.

— Reply to this email directly, view it on GitHub https://github.com/JLSteenwyk/orthosnap/issues/9#event-11282192283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3FUNNZKZGGFVVAT4TRFGLYKC5V7AVCNFSM6AAAAABAUD46WCVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJRGI4DEMJZGIZDQMY . You are receiving this because you authored the thread.Message ID: @.***>

-- Dr Eveline Pinseel Simons Foundation Postdoc Fellow in Marine Microbial Ecology Department of Biological Sciences University of Arkansas 850 W Dickson St, Fayetteville AR, 72701, USA https://evelinepinseel.weebly.com

JLSteenwyk commented 7 months ago

Hi Eveline,

Sorry about that.

Would you be willing to share your test files and expected output?

best,

Jacob

evelinepinseel commented 7 months ago

Hi Jacob,

See appendix - I used the test files of OrthoSNAP, and the expected output would follow the orthologs in Fig. 2b of the OrthoSNAP paper. Because I used an occupancy threshold of 1, I expect OrthoSNAP to detect the 'species0|gene0-copy_1' cluster as a separate ortholog. In this case, it identifies this cluster as a separate ortholog, but it assigns the removed duplicates to another ortholog (see the screenshot I sent in the previous message).

Thanks again! Eveline

Op di 19 dec 2023 om 11:51 schreef Jacob L. Steenwyk < @.***>:

Hi Eveline,

Sorry about that.

Would you be willing to share your test files and expected output?

best,

Jacob

— Reply to this email directly, view it on GitHub https://github.com/JLSteenwyk/orthosnap/issues/9#issuecomment-1863232408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3FUNPDZPS776D25K3BBBDYKHH3FAVCNFSM6AAAAABAUD46WCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRTGIZTENBQHA . You are receiving this because you authored the thread.Message ID: @.***>

-- Dr Eveline Pinseel Simons Foundation Postdoc Fellow in Marine Microbial Ecology Department of Biological Sciences University of Arkansas 850 W Dickson St, Fayetteville AR, 72701, USA https://evelinepinseel.weebly.com

JLSteenwyk commented 7 months ago

Hi Eveline,

Sorry about that error. It should be fixed as of 1.3.0. Regarding monophyletic pruning versus paraphyletic pruning, I don't think OrthoSNAP should support paraphyletic pruning. It would be too difficult to determine which paralog to keep.

One thing I was pleasantly surprised to see when developing OrthoSNAP is that SC-OGs and SNAP-OGs have the same phylogenetic information content (see Fig. 3 from the manuscript). I think this, in part, stems from our strict definition of inparalog pruning. More importantly, this observation suggests SNAP-OGs are useful for downstream phylogenomic analyses, such as genome-wide scans of selection, species tree inference, and calculations of gene-gene coevolution.

Thank you again for choosing OrthoSNAP! Please let me know if there are any other features you may want, including for other software like ClipKIT and PhyKIT!

best,

Jacob

evelinepinseel commented 7 months ago

Hi Jacob,

Brilliant, it works! Thanks a lot for adding in this feature.

Best wishes, Eveline

Op di 19 dec 2023 om 19:46 schreef Jacob L. Steenwyk < @.***>:

Closed #9 https://github.com/JLSteenwyk/orthosnap/issues/9 as completed.

— Reply to this email directly, view it on GitHub https://github.com/JLSteenwyk/orthosnap/issues/9#event-11296460564, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3FUNNUD5GCJFLMSG6DYO3YKI7ORAVCNFSM6AAAAABAUD46WCVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJRGI4TMNBWGA2TMNA . You are receiving this because you authored the thread.Message ID: @.***>

-- Dr Eveline Pinseel Simons Foundation Postdoc Fellow in Marine Microbial Ecology Department of Biological Sciences University of Arkansas 850 W Dickson St, Fayetteville AR, 72701, USA https://evelinepinseel.weebly.com

JLSteenwyk commented 7 months ago

Yay - so glad I could help your research program!

I look forward to seeing what you do with OrthoSNAP!

Wishing you a happy holidays and a strong start to 2024!

best,

Jacob