Closed endrebak closed 6 years ago
Where was this comment in the book? Would you provide a link to it.
From ?factor
In earlier versions of R, storing character data as a factor was more space efficient if there is even a small proportion of repeats. However, identical character strings now share storage, so the difference is small in most cases. (Integer values are stored in 4 bytes whereas each reference to a character string needs a pointer of 4 or 8 bytes.)
Also
y = c("AAA", "BBB")
y1 = sample(size = 1e6, x= y, replace = TRUE)
object_size(y1)
#>8 MB
object_size(factor(y1))
#>4 MB
https://csgillespie.github.io/efficientR/7-1-data-types.html#factors
Ah, so it does take less memory. Thanks :)
Is it correct that factors do not use less memory anymore? They are immensely ram-saving in pandas...