PoonLab / vindels

Developing an empirical model of sequence insertion and deletion in virus genomes
1 stars 0 forks source link

Indel rate analysis using standard R GLM #97

Closed jpalmer37 closed 3 years ago

jpalmer37 commented 4 years ago

Looking to estimate indel rates in the 5 gp120 v-loops using Poisson GLM.

jpalmer37 commented 4 years ago

I reconfigured an old script to accommodate the new data (containing both internal and terminal branches of the tree). It is set to perform a Poisson GLM on all 200 trees within each patient as follows:

glm(count ~ 1, offset=log(length), family="poisson")

where count and length refer to the ins/del counts and length in days, respectively, found along all tree branches.

After performing a few trial runs, I found that invalid results tend to be scattered throughout the resulting indel rate estimates. These occur in cases where the input counts vector is fully empty (all zeros), so the GLM is unable to converge on a meaningful answer (displays a number around 1E-13). I am continuing to look into this. Just wondering if you had any advice on how to approach this. Thanks!