TuringLang / TuringGLM.jl

Bayesian Generalized Linear models using `@formula` syntax.
https://turinglang.org/TuringGLM.jl/dev
MIT License
71 stars 7 forks source link

turing_model part 1: fixed-effects #17

Closed storopoli closed 2 years ago

storopoli commented 2 years ago

Ok, this is a big PR. 50% of the package was done in this PR.

Mainly it implements everything for specifying and sampling models using @formula macro that do not have random-effects, i.e. the (1 | group), (x1 | group) or (1 + x1 | group) terms inside the @formula.

Implemented likelihoods:

  1. normal
  2. student-t
  3. bernoulli
  4. poisson
  5. negative binomial

Datasets in data/

I am also adding 3 datasets from stan-dev/rstanarm:

  1. kidiq
  2. wells
  3. roaches

Their license is GPL-3. They are used extensively in tutorials(see storopoli/Bayesian-Julia and also Gelman & Hill (2007), Gelman et al. (2013) (the BDA), Gelman et al. (2020) (RoS).

I know that @rikhuijzer hates data being hard-coded into a package but they are very small and are used for tests and tutorials (not implement yet, but in the roadmap)

The turing_model docstring deserves extra attention.

Relates to #2.

@yebai feel free to review or ask for others to review.

rikhuijzer commented 2 years ago

I know that @rikhuijzer hates data being hard-coded into a package but they are very small and are used for tests and tutorials (not implement yet, but in the roadmap)

But the drawbacks from Artifacts are negligible. It could be something like

function dataset(name::AbstractString)
     path = ""
     if name == "roaches"
           path = joinpath(Artifacts"roaches", "roaches.csv")
    elseif ...
    end
    return read_data(path)
end

It's easier than it looks: https://pkgdocs.julialang.org/v1/artifacts/. It clutters the diff less, makes switching to a newer version of the dataset easier, you even have to handle paths slightly less yourself because you can just say joinpath(Artifact"roaches", "roaches.csv") and it makes adding larger datasets in the same way at a later moment possible.

storopoli commented 2 years ago

Things to do:

storopoli commented 2 years ago

Ready to review again. I could not implement easily turing_code function because I need to figure how to parse the Prior structs DefaultPrior CustomPrior to be displayed in the turing_code function. I think we should leave this for the 0.2.0 release or future releases.

The whole turing_model API for non-hierarchical models is done. I've created a custom type for the likelihood called Model. I had to be creative with the naming to avoid conflicts with the Distributions.jl types because we need them in the namespace for users to specify custom priors.