Open arunsrinivasan opened 10 years ago
I second this feature and in relation to this, some of the functionality at Morpho (http://bit.ly/1Tzc7Nj) looks interesting and very related to what Griffith mentions above, I guess.
I think this issue needs a use case MRE.
If this were implemented, nicely written variable names (with spaces and special characters) could appear in tables and plots that are output straight to latex, without post-processing.
This seems to be quite a deep request and probably better suited to an add-on package as it will likely require S3 or S4 methods for columns to auto-replace their names with their labels.
without spending hours hand coding variable abbreviations to complicated variable names
I'm not seeing why this is the case. In the example, there weren't "hours spend hand coding" (unless that was already done upstream and is anyway moot) -- we simply copy the labels
attribute onto the data.table
object -- either the object itself, or onto the columns individually.
This is and has always been possible (though I agree quite poorly documented) in base R and hence data.table
. So, barring a more specific example of the anticipated workflow/API, I vote to close
Todays, beyond Hmisc
other packages like haven
, labelled
or sjlabelled
contribute to manage labels in the tidyverse
package family.
I am starting to learn data.table
, but not having the posibility of managing labels could discourage me to go on with it. It may be the case of many other people, since variable labels and categorical variable value labels are very useful.
Thank you anyway for the great package.
It is even more important because non-native encoding in column names cannot be reliably handled everywhere, and it seems that we will have to force users to change their column names in some cases. In such case labels
could still carry required column names in any encoding.
@iago-pssjd could you maybe link a manual page that describes usage of those in some of the mentioned packages?
Yes, I link two pages for both labelled
and sjlabelled
, even when the second overlap the first a bit:
https://cran.r-project.org/web/packages/labelled/vignettes/intro_labelled.html
https://strengejacke.github.io/sjlabelled/articles/intro_sjlabelled.html
Thank you!
I had a brief look at links and it seems to be much broader approach. AFAIU what we really need is just an extra attribute, that has to be retained/handled during common operations
d = data.table(celsius = 20, fahrenheit = 68)
setlabels = function(x, labels) {
setattr(d, "labels", labels)
}
setlabels(d, labels = c("°C","°F"))
and then handle that nicely in print.data.table
, fwrite(yaml=TRUE)
print(d)
# °C °F
# celsius fahrenheit
#1: 20 68
Submitted by: Griffith Rees; Assigned to: Nobody; R-Forge link
One data management feature of stata which R lacks is descriptions of variables within the standard dataframe. The Hmisc package deals with this in a simple way: http://www.statmethods.net/input/variablelables.html. While this seems like a very trivial change, it allows large social science datasets with opaque variable names (have a look at the US Census) to actually be manageable within R without spending hours hand coding variable abbreviations to complicated variable names. If this were implemented, nicely written variable names (with spaces and special characters) could appear in tables and plots that are output straight to latex, without post-processing.
An example of how this could be used with the existing stata importer:
dta2data.table <- function(path) { dta <- read.dta(path) d <- data.table(dta) setlabel(d, attr(dta, "val.labels")) return(d) }
Thanks again for an excellent and supremely useful project :)