Closed compleathorseplayer closed 2 years ago
Hi @compleathorseplayer, sorry to hear you're running into trouble!
Note though that the DataFrames library is not actually required for using GLM; in the "fitting GLM models" section of the documentation, it states that any structure that is compatible with the interface specified by the Tables library can be used. This includes things like vectors of named tuples, which can be constructed without any other package dependencies. It also includes DataFrame
s, which are available when the DataFrames library is installed and loaded.
Do you have an example of some code that isn't working as expected? I think that would help pinpoint the issue.
Thanks. It seems to have to do with @formula() - the lm(x,y) syntax seems to work regardless.
It seems to have to do with
@formula()
Hm, interesting. It'd be good to see an example of some code you have that doesn't work if you can provide one. In the meantime, @kleinschmidt, have you seen anything like this before?
I can't reproduce (see below for a working example with JUST GLM). My hunch is that StatsModels/@formula
needs some kind of Tables.jl table (e.g., a named tuple of vectors like below), but it doesn't have to be a dataframe. If you want to provide input as a DataFrame, you need DataFrames.jl as a dependency.
julia> using Pkg; Pkg.add("GLM")
julia> using GLM
julia> my_table = (; y=rand(10), x1=rand(10), x2=rand('a':'b', 10))
(y = [0.5682488636362382, 0.17197789596062807, 0.3506216334793084, 0.8072853497852225, 0.5012640861462717, 0.8900214619075134, 0.5315620660933361, 0.3094426146385296, 0.20359501557647441, 0.3161968669038068], x1 = [0.12550017823736537, 0.14895836178426625, 0.7314434538141096, 0.5441900453308146, 0.4189847366481383, 0.28566682522788844, 0.3849599719979039, 0.8120194120664842, 0.9961705212901149, 0.9210058138271773], x2 = ['a', 'b', 'a', 'a', 'a', 'a', 'b', 'a', 'a', 'a'])
julia> lm(@formula(y ~ x1 * x2), my_table)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}
y ~ 1 + x1 + x2 + x1 & x2
Coefficients:
──────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
──────────────────────────────────────────────────────────────────────────
(Intercept) 0.855467 0.14322 5.97 0.0010 0.505021 1.20591
x1 -0.599188 0.21338 -2.81 0.0308 -1.12131 -0.077066
x2: b -0.91045 0.33984 -2.68 0.0366 -1.74201 -0.0788915
x1 & x2: b 2.12284 1.07723 1.97 0.0963 -0.513047 4.75872
──────────────────────────────────────────────────────────────────────────
OK Thanks all - I am responding to student queries which were resolved by loading DataFrames. I am sorry I don't have the specific example, which occurred for me a couple weeks ago. Perhaps the only issue was that the error messages did not mention the unmet dependency of Tables or DataFrames. Thanks
Closing then, feel free to reopen if you have a specific case.
The following has come up in my own work and in the classes that I teach.
Whenever the GLM library is used, the DataFrames is required, though it is possible to load the GLM library with 'import' or 'using' without any warnings or error messages, even if DataFrames is not there.
The routines will not run until DataFrames is imported, though the error messages do not state that the issue is this unmet dependency [it took me a long time to figure out why the same code worked once for me and not later]