Open dmbates opened 5 years ago
I think Tables.jl might be more general than needed for GlmResp
. I like https://github.com/piever/StructArrays.jl which is conceptually a bit simpler and would give us row iteration.
That sounds good to me.
@dmbates I think we need some change here both for GlmResp
and LmResp
. The reason is that if your y
is a view
or e.g. Arrow.Primitive
fitting will not work as you are unable to ensure that all V<:FPVector
have the same type. I am not sure how to best fix it as I do not know internals of GLM.jl well enough, but for sure the current specification is too restrictive.
GlmResp
and LmResp
were formulated a long time ago and a redesign may be warranted. There are several competing goals - generality, ease-of-use, efficiency for the most common cases - that would need to be balanced. The original design was based on the glm function in R which also was formulated early on in the history of the language and based on an even earlier implementation in S. It would be worthwhile discussing the overall structure and how these different objectives should be balanced. I don't see a Discussions area in this repository. Is it possible to add it? I looked but didn't see an obvious way to do so.
I think it is OK just to discuss it in this issue.
Having done some more benchmarking it looks like I am trying to solve a non-problem here. I keep convincing myself that the bottleneck in fitting a glm must be the call to updateμ!
because it has to evaluate so many intermediates but it turns out that is not the case.
What I will do instead is update the perf/glm.jl file to modern Julia so that others can run benchmarks and profiles.
A
GlmResp
contains several parallel vectors, and a distributionD
. Most of the functions that apply to these objects iterate over these vectors in parallel. Would it make sense to store these vectors in aTable
in the sense of the https://github.com/JuliaData/Tables.jl package? The update operations are like iterating over a rowtable.