IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 102 forks source link

Handle labelled data better in R-Instat #8903

Open rdstern opened 3 months ago

rdstern commented 3 months ago

I am getting a series of problems that seem to be caused by the way we handled labelled data in R-Instat. I think resolving them will need @lilyclements and @volloholic and maybe Danny as well. But there could be a good payoff!

I thought (earlier) we just needed to add facilities for multiple missing values. I am now of the view that they will be accommodated as part of our improved treatment of labelled data.

I note the following points - some may help, while others could be "red herrings"

a) rio uses haven for importing spss, sas and stat files. Haven is maintained by Hadley Wickham and is part of tidyverse. b) The labelled package is useful - I assume c) The labelled package has some datasets that give an error with our current Import from library. Should they be readable? d) The questionr package has 3 datasets from a fertility survey, namely children, households and women. They each import easily. They have haven.labelled variables.
1) Our summary (and skim) dialogs fail with these datasets. 2) Making them character is bad for partially labelled variables. Seems ok for the fully labelled ones. 3) Making them numeric also introduces an oddity, and so on. 4) The dialog Prepare > Check Data > Delete Value Labels works fine in simply deleting the value labels - maybe in the whole data frame. But it still leaves the variables to be of class haven_labelled and the summary command still doesn't work.