JuliaStats / RDatasets.jl

Julia package for loading many of the data sets available in R
GNU General Public License v3.0
160 stars 56 forks source link

clean_colnames!() #14

Closed stewartwatts closed 10 years ago

stewartwatts commented 11 years ago

Would it make sense to clean_colnames!() by default in data.jl?

Creating a Formula with a "." in a colname causes a somewhat cryptic error.

using DataFrames using RDatasets using GLM

swiss = data("datasets", "swiss") fit = lm(:(Fertility ~ Agriculture + Infant.Mortality), swiss)

ERROR: Non-call expression encountered in dospecials at /home/stewart/.julia/DataFrames/src/formula.jl:68 in map at cell.jl:19 in dospecials at /home/stewart/.julia/DataFrames/src/formula.jl:72 in Terms at /home/stewart/.julia/DataFrames/src/formula.jl:128 in ModelFrame at /home/stewart/.julia/DataFrames/src/formula.jl:172 in lm at /home/stewart/.julia/GLM/src/lm.jl:37 in lm at /home/stewart/.julia/GLM/src/lm.jl:42

clean_colnames!(swiss) fit = lm(:(Fertility ~ Agriculture + Infant_Mortality), swiss)

Formula: Fertility ~ :(+(Agriculture,Infant_Mortality)) Coefficients: 3x4 DataFrame: Estimate Std.Error t value Pr(>|t|) [1,] 21.9546 11.5285 1.90437 0.0634125 [2,] 0.208919 0.0686417 3.04362 0.00393547 [3,] 1.88563 0.535221 3.52308 0.00100803

johnmyleswhite commented 11 years ago

Yeah, we should finally clean up all of these data sets to remove any remaining row names. I'd also like to gzip everything to save on bandwidth when downloading the package.

diegozea commented 10 years ago

I want to report the same issue, using a dot on the name of the columns makes Formula fails:

julia> lm(:(Petal.Length ~ Species), iris)
ERROR: Petal not defined
 in anonymous at /home/dzea/.julia/v0.2/DataFrames/src/dataframe.jl:1504
 in with at /home/dzea/.julia/v0.2/DataFrames/src/dataframe.jl:1505
 in anonymous at /home/dzea/.julia/v0.2/DataFrames/src/formula.jl:173
 in map at cell.jl:19
 in ModelFrame at /home/dzea/.julia/v0.2/DataFrames/src/formula.jl:173
 in lm at /home/dzea/.julia/v0.2/GLM/src/lm.jl:37
 in lm at /home/dzea/.julia/v0.2/GLM/src/lm.jl:42
johnmyleswhite commented 10 years ago

I just haven't had time to get to this. Would definitely appreciate help on the simplest version: calling clean_colnames and removing every column whose name is just ``.

garborg commented 10 years ago

Can be closed, I think.