csgillespie / efficientR

Efficient R programming: a book
https://csgillespie.github.io/efficientR/
Other
720 stars 375 forks source link

Last sentence of chap 7.1.2: "the space gain in factors is now space." #242

Closed endrebak closed 6 years ago

endrebak commented 6 years ago

In early versions of R, storing character data as a factor was more space efficient. However since identical character strings now share storage, the space gain in factors is now space.

Is it correct that factors do not use less memory anymore? They are immensely ram-saving in pandas...

csgillespie commented 6 years ago

Where was this comment in the book? Would you provide a link to it.

From ?factor

In earlier versions of R, storing character data as a factor was more space efficient if there is even a small proportion of repeats.  However, identical character strings now share storage, so the difference is small in most cases.  (Integer values are stored in 4 bytes whereas each reference to a character string needs a pointer of 4 or 8 bytes.)

Also

y = c("AAA", "BBB")
y1 = sample(size = 1e6, x= y, replace = TRUE)
object_size(y1)
#>8 MB
object_size(factor(y1))
#>4 MB
endrebak commented 6 years ago

https://csgillespie.github.io/efficientR/7-1-data-types.html#factors

Ah, so it does take less memory. Thanks :)