Closed Nosferican closed 4 years ago
Dear @Nosferican Thank you very much for your help. For the first issue please set using GlobalSearchRegression after addprocs(#), i.e.:
using Distributed addprocs(2) using Test, RDatasets, GlobalSearchRegression data = RDatasets.dataset("Ecdat", "Crime")
model = gsreg("CRMRTE ~ Prb*", data, criteria = [:aicc, :bic], modelavg = true)
For the second one.... You are right, we cannot handle categorical variables directly yet.... Your solution is very flexible, we will try to include some functionality about this ASAP.
As for documentation, we are improving our io page right now. Thanks again
Dear @Nosferican Regarding the two last issues 1) Perfect collinearity returns an error.... But this will be modified. Instead of discard some random variable (as Stata does) we will provide the user with a list of perfectly correlated variables ASAP. 2) Thank you very much, we will change print by show for the result.
Hey @Nosferican I'm working with the show vs print fix. Can you check this line? There we're using the Base.show function with io as a param. Is this enough or we're missing something?
@Nosferican: One more clarification about your advice "While I enjoy the flexible inputs for formula, using StatsModels.@formula would be nice for poly(x, 2), interactions, and handling categorical variables." In order to deal with combinatorial issues and parallelism we need to implemente our own interaction function. This is included in our next package (https://github.com/ParallelGSReg/ModelSelection.jl)
Try,
Base.show(io::IO, result::GSRegResult) = print(io, to_string(result))
I haven't had much experience with Distributed
. It seems I ended up hitting https://github.com/JuliaLang/julia/issues/28781 and a couple issues, but seems fine now with the clarification. Might be good to er on the side of indicating a line or two on the Distributed
framework for Julia. Maybe Andreas can shine some light on it (especially as it relates to the performance).
As for the handling of categorical variables, using StatsModels.modelcols
should give you the matrix easy to-plug in. For interactions, you might want to check how to handle those if for example overloading the default.
For the multicollinearity, I thought it was currently silently dropping the linearly dependent features,
data = DataFrame(y = rand(100), x1 = rand(100))
data[!,:x2] .= 2 * data.x1
data[!,:x3] .= 0.5 * data.x1
gsreg("y x1 x2 x3", data)
In the following case using the data from the main example in the issue,
model = gsreg("CRMRTE ~ Prb*",
data,
criteria = [:aicc, :bic],
modelavg = true)
propertynames(model)
model.nobs # 630
model.time # nothing
model.parallel # nothing
model.criteria # [:aicc, :bic, :r2adj]
I am not sure if the model criteria is leaking the r2adj
.
Dear @Nosferican , Thank you very much for all your suggestions. GlobalSearchRegression.jl has been updated. There is a new release (v1.0.4). We have:
Thank you very much for all your help. Nicolas
Great! I will take a look at it when I get a chance in the next couple days. Happy to be of help.
There seems to still be an issue with this,
using RDatasets, GlobalSearchRegression
data = RDatasets.dataset("Ecdat", "Crime")
model = gsreg("CRMRTE ~ Prb*", data, criteria = [:aicc, :bic], modelavg = true)
KeyError: key :r2adj not found
The docs are good. I only noticed that gsreg
is still without docstring. Would be good to add examples to it through jldoctest
. I believe the v1.0.4 didn't make it to the registry. Does the repo have Registrator.jl / TagBot enabled?
@Nosferican, We have introduced all your advices in a new relase (v1.0.6).
the r2adj error in the model average option has been fixed.
the gsreg function already has docstring documentation.
the Registrator tagbot is already enabled.
We are only waiting for human interaction to register our v1.0.4 realease first.... but the v1.0.6 version is alrady available using "add https://github.com/ParallelGSReg/GlobalSearchRegression.jl" We hope it will also be registered in a few days! Thanks again José
Thanks for the update. Will check the changes early this week.
Thanks to you José
El lun., 21 oct. 2019 a las 1:00, José Bayoán Santiago Calderón (< notifications@github.com>) escribió:
Thanks for the update. Will check the changes early this week.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ParallelGSReg/GlobalSearchRegression.jl/issues/4?email_source=notifications&email_token=AI3QGRVPZ6MKAWBMB3YZWN3QPUSPRA5CNFSM4I37G66KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBY7O2I#issuecomment-544339817, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3QGRRXAJDUSXUZ54IRWQ3QPUSPRANCNFSM4I37G66A .
-- Demian T. Panigo https://www.researchgate.net/profile/Demian_Panigo Lic. en Economía, UNLP, Master en Cs Sociales, UBA, Doctor en Economía, EHESS-ENS (Paris) Investigador Independiente del CONICET Docente investigador de la UNM, la UNQ, la UNDAV y la UNLP.
Seems good. Hope the feedback was useful.
@Nosferican Jose the last release is already registered. We can use the updated package just with: pkg > add GlobalSearchRegression
I am opening this issue to track some comments and issues part of the the review process going on in https://github.com/JuliaCon/proceedings-review/issues/53.
Issues
Suggestions
gsreg
@autodocs
for the API documentationdata = RDatasets.dataset("Ecdat", "Crime") categorical!(data, findall([!(typeof(col) <: AbstractVector{<:Number}) for col in eachcol(data)])) @test_throws(MethodError, gsreg(join(names(data)[3:end], ' '), data) )