turing_model part 1: fixed-effects

TuringLang / TuringGLM.jl

Bayesian Generalized Linear models using `@formula` syntax.

https://turinglang.org/TuringGLM.jl/dev

MIT License

71 stars 7 forks source link

turing_model part 1: fixed-effects #17

Closed storopoli closed 2 years ago

storopoli commented 2 years ago

Ok, this is a big PR. 50% of the package was done in this PR.

Mainly it implements everything for specifying and sampling models using @formula macro that do not have random-effects, i.e. the (1 | group), (x1 | group) or (1 + x1 | group) terms inside the @formula.

Implemented likelihoods:

normal
student-t
bernoulli
poisson
negative binomial

Datasets in `data/`

I am also adding 3 datasets from stan-dev/rstanarm:

kidiq
wells
roaches

Their license is GPL-3. They are used extensively in tutorials(see storopoli/Bayesian-Julia and also Gelman & Hill (2007), Gelman et al. (2013) (the BDA), Gelman et al. (2020) (RoS).

I know that @rikhuijzer hates data being hard-coded into a package but they are very small and are used for tests and tutorials (not implement yet, but in the roadmap)

The turing_model docstring deserves extra attention.

Relates to #2.

@yebai feel free to review or ask for others to review.

rikhuijzer commented 2 years ago

I know that @rikhuijzer hates data being hard-coded into a package but they are very small and are used for tests and tutorials (not implement yet, but in the roadmap)

But the drawbacks from Artifacts are negligible. It could be something like

function dataset(name::AbstractString)
     path = ""
     if name == "roaches"
           path = joinpath(Artifacts"roaches", "roaches.csv")
    elseif ...
    end
    return read_data(path)
end

It's easier than it looks: https://pkgdocs.julialang.org/v1/artifacts/. It clutters the diff less, makes switching to a newer version of the dataset easier, you even have to handle paths slightly less yourself because you can just say joinpath(Artifact"roaches", "roaches.csv") and it makes adding larger datasets in the same way at a later moment possible.

storopoli commented 2 years ago

Things to do:

[x] Break up the huge turing_model into smaller functions. Maybe make family a struct instead of a string? Yes Normal() is the default. It is a Distribution type and Normal.
[ ] Have turing_model call turing_code and eval it. turing_code should be exposed also.
[x] Testing with priors instead of my_prior
[x] Testing with f = @formula(...) instead of inside the turing_model
[x] Testing Chains stuff with only one chn variable

storopoli commented 2 years ago

Ready to review again. I could not implement easily turing_code function because I need to figure how to parse the Prior structs DefaultPrior CustomPrior to be displayed in the turing_code function. I think we should leave this for the 0.2.0 release or future releases.

The whole turing_model API for non-hierarchical models is done. I've created a custom type for the likelihood called Model. I had to be creative with the naming to avoid conflicts with the Distributions.jl types because we need them in the namespace for users to specify custom priors.

TuringLang / TuringGLM.jl

turing_model part 1: fixed-effects #17

Implemented likelihoods:

Datasets in data/

Datasets in `data/`