flyaflya / causact

causact: R package to accelerate computational Bayesian inference workflows in R through interactive visualization of models and their output.
http://causact.com
Other
43 stars 12 forks source link

Dev #19

Closed Danidapena closed 5 years ago

Danidapena commented 5 years ago

Logistic Function

flyaflya commented 5 years ago

Sounds and looks interesting, but I am not sure what this pull requests does or why I should merge it with the causact package. How would it be used?

Danidapena commented 5 years ago

`

Add a logistic function that can be used with greta

EXAMPLE

install_github("rmcelreath/rethinking") #Install rethinking packages to obtain the data

library(tidyverse) library(causact) library(greta) library(bayesplot) library(rethinking) library(dplyr) library(ggplot2) library(psych) library(stringr)

data(reedfrogs) d = reedfrogs

d=d%>% mutate(tank=row_number()) #make every observation a tank

graph = dag_create() %>% dag_node("Tadpole Mortality","s", rhs = binomial(n,p), #Give the node a distribution data = d$surv) %>% # Add data to the node dag_node("Number of frogs","n", data = d$density, child = "s")%>% # Indicate the childs of the node dag_node("Probability of survive","p", rhs = Logistic(alpha), child = "s") %>% dag_node("logaritmic odds","alpha", child = "p", rhs = normal(a,b))%>% dag_node("Average","a", rhs = normal(0,1), child = ("alpha"))%>% dag_node("Deviation","b", rhs = cauchy(0,1,truncation = c(0,Inf)), child = ("alpha")) %>% dag_plate("tank", "i", data = d$tank, nodeLabels = c("alpha")) #Indicate the number of tanks

graph %>% dag_render(shortLabel = FALSE) # Plot the model graph %>% dag_greta(mcmc=TRUE) # Run greta to get the the posteroirs

numextract <- function(string){ str_extract(string, "\-\d+\.\d*") } # Function to extract the number from a string

Post=tidyDrawsDF%>% filter(key !="b" , key!="a")%>% # Get just the posterior values for the alpha tank group_by(key)%>% summarise(alpha_tank = median(value))%>% # Get the median of the posterior for each tank mutate(Posterior =Logistic(alpha_tank))%>% # Get the probabilities of every tank mutate(tank = numextract(key))

average= tidyDrawsDF%>% filter(key !="b" , key!="a")%>% mutate(prob=Logistic(value))%>% # Probability of every tank summarise(prob = median(prob)) # The median value of all probabilities

data=merge(d,Post,by="tank") # Merge probabilities of posterior with the proportion of survival in the original data

data=data%>% mutate(Data=propsurv)%>% gather(Method,Probability,Data,Posterior) # Get Tidy data

Make a nice plot with the posterior probabilities and the original proportions divide as small tanks, median tanks and large tanks

data%>%ggplot(aes(x=tank,y=Probability, colour="Method"))+ geom_point(aes(color=Method))+ xlab("Tank") + ylab("Probability of survive") + xlim(c(0,48))+ ylim(c(0,1))+ geom_vline(xintercept = 16.5)+ geom_vline(xintercept = 32.5)+ geom_hline(yintercept = average$prob,linetype="dashed")+ annotate(geom="text", x=7, y=0,label="Small Tanks")+ annotate(geom="text", x=16+8, y=0,label="medium tanks")+ annotate(geom="text", x=32+8, y=0,label="large tanks")+ ggtitle( "Probability of survival in each tank")

plot(NULL, xlim=c(-6,6), ylim=c(0,0.32), xlab="log-odds of survive", yla="Density",main = ("Infered populations of survival across tanks")) # Set the enviroment for the plot for (i in 1:100) curve(dnorm(x,drawsDF$a[i] , drawsDF$b[i]), add = TRUE,col = col.alpha("black",0.2)) # Plot the first 100 Gaussians for different mean and sigma values

sim_tanks = rnorm(8000,drawsDF$a,drawsDF$b) # Generate 8000 simulated tank p = Logistic(sim_tanks)

Plot the probability of survials based in the density

tibble(x = p) %>% ggplot(aes(x = x)) + geom_density(fill = "cadetblue1",color ="cadetblue1") + xlab("probability of survival ")+ ggtitle( "Probability of survival for 8000 new simulated")

`

flyaflya commented 5 years ago

Please package as a vignette. See http://r-pkgs.had.co.nz/vignettes.html.

Thanks!