daijiang / phyr

Functions for phylogenetic analyses
https://daijiang.github.io/phyr/
GNU General Public License v3.0
30 stars 10 forks source link

Predicted values for Poisson distributed data #72

Open ammoncorl opened 2 years ago

ammoncorl commented 2 years ago

I was hoping that you could help me understand how best to generate predictions from data modeled by phyr. I am trying to take the coefficients from a Poisson distributed model and apply them to generate predictions for new data. In case it helps, my model is of the form: pglmm_compare(y~x1+ x2, family = "Poisson", phy = tree, data = datafile) and I have been looking at pglmm_predicted_values. I have read the following about the predicted values from phyr: ""re.form: (formula, NULL, or NA) specify which random effects to condition on when predicting. If NULL, include all random effects (i.e Xb + Zu); if NA or ~0, include no random effects (i.e. Xb)" from https://daijiang.github.io/phyr/reference/pglmm-predicted-values.html

When I set re.form = NA, then it seems like the predictions are just based on the coefficients of the model. However, the predictions with the random effects included are a much better fit for the modeled data. Thus, I am wondering how to obtain the estimate of the random effects from the phyr output. I am also wondering if the random effects in question are related to the phylogeny. If that is the case, perhaps it does not make sense to include them for generating predictions on new data where the phylogenetic placement is not known? Any additional information about the Zu term that you referred to would be greatly appreciated!

Thank you very much for your work on the phyr program. It has been very helpful to me!

arives commented 2 years ago

ammoncorl,

This is a complicated topic. For community data and pglmm(), the random effects involve repeated observations from the same species and sites, but for species data and pglmm_compare(), the random effects depend only on the phylogeny. If you want some of the gory detail, see https://academic.oup.com/sysbio/article/68/2/234/5098616?login=true. With pglmm_compare(), for making predictions from new data when you don’t know the phylogenetic location of the new species, the random effects (based only on the phylogenetic information) don’t help, as you note in your second paragraph.

So, I think the simple answer is the one that you gave, that you can’t really use the random effects for prediction.

Please, let me know if this answer makes sense.

Cheers, Tony

From: ammoncorl @.> Date: Wednesday, June 15, 2022 at 4:09 AM To: daijiang/phyr @.> Cc: Subscribed @.***> Subject: [daijiang/phyr] Predicted values for Poisson distributed data (Issue #72)

I was hoping that you could help me understand how best to generate predictions from data modeled by phyr. I am trying to take the coefficients from a Poisson distributed model and apply them to generate predictions for new data. In case it helps, my model is of the form: pglmm_compare(y~x1+ x2, family = "Poisson", phy = tree, data = datafile) and I have been looking at pglmm_predicted_values. I have read the following about the predicted values from phyr: ""re.form: (formula, NULL, or NA) specify which random effects to condition on when predicting. If NULL, include all random effects (i.e Xb + Zu); if NA or ~0, include no random effects (i.e. Xb)" from https://daijiang.github.io/phyr/reference/pglmm-predicted-values.html

When I set re.form = NA, then it seems like the predictions are just based on the coefficients of the model. However, the predictions with the random effects included are a much better fit for the modeled data. Thus, I am wondering how to obtain the estimate of the random effects from the phyr output. I am also wondering if the random effects in question are related to the phylogeny. If that is the case, perhaps it does not make sense to include them for generating predictions on new data where the phylogenetic placement is not known? Any additional information about the Zu term that you referred to would be greatly appreciated!

Thank you very much for your work on the phyr program. It has been very helpful to me!

— Reply to this email directly, view it on GitHubhttps://github.com/daijiang/phyr/issues/72, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYX6LDUBYEV2YWYNQW4TELVPFJNNANCNFSM5YZ2KVVA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

ammoncorl commented 2 years ago

Dear Tony,

Thank you very much for your quick reply. I will be sure to read the article that you referenced to learn more. I had one quick follow-up question. Is it possible to make predictions for new data that does have associated phylogenetic information? If so, can the predict function within the phyr package take a phylogeny and set of trait data to generate predictions for species outside of the ones in the original model? This is ideally what I would like to do for the subset of species in my dataset that have phylogenetic information, because including the phylogenetic random effects significantly improves the fit of the predictions for the modeled data. I have read the Garland and Ives 2000 American Naturalist paper that discussed predicting new data points while accounting for phylogenetic relationships. However, I am not sure if such a thing has or can be implemented within the phry modeling framework.

Thank you again for your help with this!

Best wishes,

Ammon Corl

ammoncorl, This is a complicated topic. For community data and pglmm(), the random effects involve repeated observations from the same species and sites, but for species data and pglmm_compare(), the random effects depend only on the phylogeny. If you want some of the gory detail, see https://academic.oup.com/sysbio/article/68/2/234/5098616?login=true. With pglmm_compare(), for making predictions from new data when you don’t know the phylogenetic location of the new species, the random effects (based only on the phylogenetic information) don’t help, as you note in your second paragraph. So, I think the simple answer is the one that you gave, that you can’t really use the random effects for prediction. Please, let me know if this answer makes sense. Cheers, Tony From: ammoncorl @.> Date: Wednesday, June 15, 2022 at 4:09 AM To: daijiang/phyr @.> Cc: Subscribed @.> Subject: [daijiang/phyr] Predicted values for Poisson distributed data (Issue #72) I was hoping that you could help me understand how best to generate predictions from data modeled by phyr. I am trying to take the coefficients from a Poisson distributed model and apply them to generate predictions for new data. In case it helps, my model is of the form: pglmm_compare(y~x1+ x2, family = "Poisson", phy = tree, data = datafile) and I have been looking at pglmm_predicted_values. I have read the following about the predicted values from phyr: ""re.form: (formula, NULL, or NA) specify which random effects to condition on when predicting. If NULL, include all random effects (i.e Xb + Zu); if NA or ~0, include no random effects (i.e. Xb)" from https://daijiang.github.io/phyr/reference/pglmm-predicted-values.html When I set re.form = NA, then it seems like the predictions are just based on the coefficients of the model. However, the predictions with the random effects included are a much better fit for the modeled data. Thus, I am wondering how to obtain the estimate of the random effects from the phyr output. I am also wondering if the random effects in question are related to the phylogeny. If that is the case, perhaps it does not make sense to include them for generating predictions on new data where the phylogenetic placement is not known? Any additional information about the Zu term that you referred to would be greatly appreciated! Thank you very much for your work on the phyr program. It has been very helpful to me! — Reply to this email directly, view it on GitHub<#72>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYX6LDUBYEV2YWYNQW4TELVPFJNNANCNFSM5YZ2KVVA. You are receiving this because you are subscribed to this thread.Message ID: @.>

arives commented 2 years ago

Ammon,

You can certainly do this, but I would start with the fitted phylogenetic covariance matrix. You could then use the GLS approach as in Garland and Ives for predictions:

            ***@***.***  = V[i,.]V[-i,-i]-1(Y[-i] – ***@***.*** + ***@***.*** X[-i]))

where V[i,.] is row i of matrix V(•), V[-i,-i]-1 is the inverse of V(•) after row i and column i are removed, and Y[-i] and X[-i] denote Y and X with element i removed. The term (Y[-i] – @. + @. X[-i])) gives the residual errors for the observations other than i. You’d do this on the covariance matrix in the transformed Gaussian space (i.e., the errors epsilon are Gaussian), and then use the link function to transform into the data space. This procedure is pretty much independent of phyr or the way in which you fit the phylogenetic model. There aren’t any built-in facilities in phyr to do this, although we have thought abiout adding them if people want. It is always hard in situations like this, because often users need specific things that are hard to anticipate. Still, the method I’ve outlined above should be easy to implement.

Cheers, Tony

From: ammoncorl @.> Date: Thursday, June 16, 2022 at 3:18 AM To: daijiang/phyr @.> Cc: Anthony R. Ives @.>, Comment @.> Subject: Re: [daijiang/phyr] Predicted values for Poisson distributed data (Issue #72)

Dear Tony,

Thank you very much for your quick reply. I will be sure to read the article that you referenced to learn more. I had one quick follow-up question. Is it possible to make predictions for new data that does have associated phylogenetic information? If so, can the predict function within the phyr package take a phylogeny and set of trait data to generate predictions for species outside of the ones in the original model? This is ideally what I would like to do for the subset of species in my dataset that have phylogenetic information, because including the phylogenetic random effects significantly improves the fit of the predictions for the modeled data. I have read the Garland and Ives 2000 American Naturalist paper that discussed predicting new data points while accounting for phylogenetic relationships. However, I am not sure if such a thing has or can be implemented within the phry modeling framework.

Thank you again for your help with this!

Best wishes,

Ammon Corl

ammoncorl, This is a complicated topic. For community data and pglmm(), the random effects involve repeated observations from the same species and sites, but for species data and pglmm_compare(), the random effects depend only on the phylogeny. If you want some of the gory detail, see https://academic.oup.com/sysbio/article/68/2/234/5098616?login=true. With pglmm_compare(), for making predictions from new data when you don’t know the phylogenetic location of the new species, the random effects (based only on the phylogenetic information) don’t help, as you note in your second paragraph. So, I think the simple answer is the one that you gave, that you can’t really use the random effects for prediction. Please, let me know if this answer makes sense. Cheers, Tony From: ammoncorl @.> Date: Wednesday, June 15, 2022 at 4:09 AM To: daijiang/phyr @.> Cc: Subscribed @.> Subject: [daijiang/phyr] Predicted values for Poisson distributed data (Issue #72https://github.com/daijiang/phyr/issues/72) I was hoping that you could help me understand how best to generate predictions from data modeled by phyr. I am trying to take the coefficients from a Poisson distributed model and apply them to generate predictions for new data. In case it helps, my model is of the form: pglmm_compare(y~x1+ x2, family = "Poisson", phy = tree, data = datafile) and I have been looking at pglmm_predicted_values. I have read the following about the predicted values from phyr: ""re.form: (formula, NULL, or NA) specify which random effects to condition on when predicting. If NULL, include all random effects (i.e Xb + Zu); if NA or ~0, include no random effects (i.e. Xb)" from https://daijiang.github.io/phyr/reference/pglmm-predicted-values.html When I set re.form = NA, then it seems like the predictions are just based on the coefficients of the model. However, the predictions with the random effects included are a much better fit for the modeled data. Thus, I am wondering how to obtain the estimate of the random effects from the phyr output. I am also wondering if the random effects in question are related to the phylogeny. If that is the case, perhaps it does not make sense to include them for generating predictions on new data where the phylogenetic placement is not known? Any additional information about the Zu term that you referred to would be greatly appreciated! Thank you very much for your work on the phyr program. It has been very helpful to me! — Reply to this email directly, view it on GitHub<#72https://github.com/daijiang/phyr/issues/72>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYX6LDUBYEV2YWYNQW4TELVPFJNNANCNFSM5YZ2KVVA. You are receiving this because you are subscribed to this thread.Message ID: @.>

— Reply to this email directly, view it on GitHubhttps://github.com/daijiang/phyr/issues/72#issuecomment-1157185782, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYX6LEDXF3Z6BTGR6JJO2DVPKMG3ANCNFSM5YZ2KVVA. You are receiving this because you commented.Message ID: @.***>