Feature Request [enhancement]: Support for Multiple Response Variables using Syntax in TuringGLM.jl

kiante-fernandez commented 1 year ago

I would like to inquire if any ongoing work exists for a feature in TuringGLM.jl that supports a syntax for specifying multiple response variables within a single model. Based on my understanding, the current formula syntax in TuringGLM.jl only allows modeling of a single response variable. However, having the capability to model multiple response variables would significantly enhance the usability and convenience of TuringGLM.jl.

Requested Feature

Compact Syntax for Multiple Response Variables: Implement a compact syntax in TuringGLM.jl that enables users to specify multiple response variables within a single model specification. This would allow users to model the relationships between multiple dependent variables and independent variables.

Proposed Formula Syntax

Compact Syntax for Multiple Response Variables could look something similar to that of brms:

formula = @formula(y ~ condition + (condition | id),
                   w ~ condition + (condition | id),
                   v ~ condition + (condition | id))

In this proposed syntax, each response variable (y, w, v) is specified on a separate line, followed by the fixed effects (condition) and the random effects ((condition | id)).

If someone could provide guidance on where to start with these modifications, I would be happy to contribute to the implementation.

storopoli commented 1 year ago

No ongoing work going but PRs are welcomed.

DominiqueMakowski commented 3 weeks ago

Related to this, we (also tagging @itsdfish) would like indeed to see if it's possible to provide a TuringGLM interface for the reaction time models implemented in SequentialSamplingModels.jl.

If I understand, most of the heavy lifting is done in turing_model() that defines the model / priors etc, so my guess is that we should implement a _model() function for the distributions we are interested in.

Could you perhaps guide us (or add a section to the documentation) what are the necessary methods that one needs to implement in order to add new model families? like _model(), _prior() etc.

Perhaps the new package extension system would be useful?

From there, we could see how to extend the formula macro to work with multi-parameters formulas

Thanks a lot!

storopoli commented 3 weeks ago

You would need to extract the multiple responses from the @formula macro from StatsModels.jl https://github.com/TuringLang/TuringGLM.jl/blob/864bfe23cb533d164bc7fcf548aa034e9d50ef80/src/TuringGLM.jl#L29

then, indeed, create a _model() and _prior() functions.

Of course you would also need to add docs and tests

storopoli commented 3 weeks ago

Perhaps the new package extension system would be useful?

I don't mind reviewing PRs. If you want to implement inside TuringGLM.jl let me know.

itsdfish commented 3 weeks ago

Last year we had a discussion and tentatively converged on the following syntax:

@formula((c,rt) ~ LBA,
         drift ~ 1 + Condition,
         threshold ~ 1 + Condition,
         ndt ~ 1 + Condition

where drift is an unbounded vector, and threshold and ndt are non-negative scalars. Before I dig into the package more, I was hoping to get an idea about the feasibility of implimenting the macro for these types of models. If you don't mind, can you please tell me whether the following are possible?

Can we support vector parameters? For example, can we broadcast ~ normal(0, 1) over the vector, and can we assign specific priors to each element e.g., drift[1] ~ normal(0, 1), drift[2] ~ normal(1, 2)?
Is there a way to enforce bounds on parameters, e.g., ndt ~ beta0 + x1 * beta1 ... + ... xn * betan >= 0?

Edit

Maybe the solution to item 2 is as simple as using a truncated normal?

storopoli commented 2 weeks ago

Is there a way to enforce bounds on parameters, e.g., ndt ~ beta0 + x1 beta1 ... + ... xn betan >= 0?

Yes, truncated(d; lower, upper) as per the truncated from Distributions.jl

Can we support vector parameters? For example, can we broadcast ~ normal(0, 1) over the vector, and can we assign specific priors to each element e.g., drift[1] ~ normal(0, 1), drift[2] ~ normal(1, 2)?

Maybe with filldist and arraydist (performance concerns)

TuringLang / TuringGLM.jl

Feature Request [enhancement]: Support for Multiple Response Variables using Syntax in TuringGLM.jl #93

Requested Feature

Proposed Formula Syntax