Closed ablaette closed 3 years ago
This is a little caveat what to consider when elaborating the solution.
x_max <- 798424101
x <- 1:x_max
max_id <- ((2^31 - 1) / 4) - 1
tail(x[536870911:length(x)])
tail(x[(max_id + 1L):length(x)])
tail(x[seq.int(from = max_id + 1L, to = length(x), by = 1L)])
You'd expect the last three lines to have the same result. But ...
Starting with R version 4.0.0, long vectors are supported, see https://cran.r-project.org/doc/manuals/r-devel/NEWS.html
Something like this could be inserted in the code:
if (getRversion() < R_system_version("4.0.0")){
max_id <- ((2^31 - 1) / 4) - 1
if (length(x) > max_id) warning("writing will fail, update to R 4.0.0 or higher ")
}
The p_attribute_encode()
function will now check for the R version and stop with an informative message if the R version is below R 4.0.0 and unable to write the token stream of a large corpus to disk.
There is a limitation of
writeBin()
I had not anticipated when writing a large vector to disk: