Should GlmResp be a Table?

JuliaStats / GLM.jl

Generalized linear models in Julia

Other

593 stars 114 forks source link

Should GlmResp be a Table? #313

Open dmbates opened 5 years ago

dmbates commented 5 years ago

A GlmResp contains several parallel vectors, and a distribution D. Most of the functions that apply to these objects iterate over these vectors in parallel. Would it make sense to store these vectors in a Table in the sense of the https://github.com/JuliaData/Tables.jl package? The update operations are like iterating over a rowtable.

andreasnoack commented 5 years ago

I think Tables.jl might be more general than needed for GlmResp. I like https://github.com/piever/StructArrays.jl which is conceptually a bit simpler and would give us row iteration.

quinnj commented 5 years ago

That sounds good to me.

bkamins commented 3 years ago

@dmbates I think we need some change here both for GlmResp and LmResp. The reason is that if your y is a view or e.g. Arrow.Primitive fitting will not work as you are unable to ensure that all V<:FPVector have the same type. I am not sure how to best fix it as I do not know internals of GLM.jl well enough, but for sure the current specification is too restrictive.

dmbates commented 3 years ago

GlmResp and LmResp were formulated a long time ago and a redesign may be warranted. There are several competing goals - generality, ease-of-use, efficiency for the most common cases - that would need to be balanced. The original design was based on the glm function in R which also was formulated early on in the history of the language and based on an even earlier implementation in S. It would be worthwhile discussing the overall structure and how these different objectives should be balanced. I don't see a Discussions area in this repository. Is it possible to add it? I looked but didn't see an obvious way to do so.

bkamins commented 3 years ago

I think it is OK just to discuss it in this issue.

dmbates commented 3 years ago

Having done some more benchmarking it looks like I am trying to solve a non-problem here. I keep convincing myself that the bottleneck in fitting a glm must be the call to updateμ! because it has to evaluate so many intermediates but it turns out that is not the case.

What I will do instead is update the perf/glm.jl file to modern Julia so that others can run benchmarks and profiles.