PoonLab / Kaphi

Kernel-embedded ABC-SMC for phylodynamic inference
GNU Affero General Public License v3.0
4 stars 2 forks source link

Reproducing Figure7 from Avino et al paper + grouping the cophylogeny analysis as high or low #146

Open Jigyasa3 opened 3 years ago

Jigyasa3 commented 3 years ago

Dear @ArtPoon lab

Thank you so much for such an amazing R package and a great paper (https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.5185). I have a few questions regarding the paper- a) I am trying to reproduce Figure7 in the paper, but I cannot seem to load the "groups" variable to the ggbiplot function.

g <- ggbiplot(p, groups=temp$Group, labels=rownames(temp), labels.size=3, var.col=rgb(0,0,0,0.4)) g <- g + scale_color_manual(name="Group", values=c('firebrick', 'cadetblue')) g <- g + theme(legend.position='none') print(g) ##where is the "Group" column?

b) I am interested in performing a similar analysis on my dataset and was wondering (i) for normalizing the different distance methods, why was the following normalize function used? normalize <- function(x) { (x-min(x)) / (max(x)-min(x)) }

c) Is the data ("https://github.com/PoonLab/cophylo/edit/master/data/TotalandKernelS1.csv") in Figure 7, raw data? (i.e. generated by running each pair of host-symbiont tree in Kaphi and then normalized in the above-mentioned function)

d) How were "high" and "low" cophylogeny determined for each dataset in the paper? Is there a specific cut-off, or a relative value after normalization?

Looking forward to your reply!

ArtPoon commented 3 years ago

Hi @Jigyasa3, sorry for the delayed response - it's a busy term. I am going to ping the lead author @mavino who should be able to help you with the R code and data files.

mavino commented 3 years ago

Thank you @ArtPoon, Hi @Jigyasa3 and thank you so much for your interest. Give me some time to answer point by point to your answer since I have not been working on this project for a long time. I will soon start replying to your answer. Thank you very much again...

mavino commented 3 years ago

Regarding to your point a), there was a problem in the code I just fixed mavino/cophylo@3409bab6 The "Group" refers the last column of file "TotalandKernelfig7.csv" which tells you if that host-parasite pair is at high or low cophylogeny. This will be coloured in the resulting biplot.

mavino commented 3 years ago

Regarding to your point b), we needed to make comparable the different distances because they have different scales, some distances have values between 0 and 1, some from 0 to plus infinity. Thus we performed a min-max normalization to put them on a same scale. Not sure why we did not use a Z-score normalization, maybe we noticed we did not have many outliers.

mavino commented 3 years ago

Regarding to your point c), yes they are raw data and then eventually normalized with min-max function

mavino commented 3 years ago

Regarding to your point d), as it is specified in the paper, high and low degree of cophylogeny was just based on authors’ assessment specific to the paper it refers. We did not specify any further cut-off.

Jigyasa3 commented 3 years ago

Hey @mavino

Thank you so much for replying and answering all my questions. I have two follow-up questions. I do not come from a statistical background, so it's possible that my understanding of the kernel method discussed in the paper is wrong. So please correct me (sorry!).

Question 1- It is mentioned in the paper that the kernel method accounts for differences in branch lengths of the host and symbiont (or parasite), and the number of nodes in the tree. Does that mean that I can compare (as in point (c) above) host-symbiont trees of different sizes (nodes) and rate of evolution with each other and draw conclusions about how much co-evolution is taking place (in kLn method)?

Question2- To examine how much coevolution is taking place, I am interested in comparing different microbial groups against the same host tree and instead of grouping them as parasitic, symbiotic or mutualistic (as done in the papers referred to in your study), I want to examine if one microbe-host tree is more coevolving than the other. So, if I use the normalized values of different distance measures, can I say that a higher value of kLn and Align means the microbe1-host tree is more co-evolving than the microbe2-host tree?

The host tree is the same for all the microbes but the no. of nodes will vary depending on how microbe-host interacts. Please let me know if my questions make sense I can explain them in more detail.

Thanks again for all the help!

mavino commented 3 years ago

no worries, it is actually our pleasure to be useful for you!! I would say yes to both of the questions. Your rationale is correct and there is the way I would use the distances. Be careful about the way the software for kLn match labels of host and parasites.

Jigyasa3 commented 3 years ago

Thank you so much!

Jigyasa3 commented 3 years ago

One last query- I understand that different distance methods have different requirements for the tree type. Align() documentation says phylo object is required while TripL() documentation requires both the trees to be rooted. Just want to confirm, Align(), MAST(), and Sim() do not require the symbiont tree to be rooted right?