Stephen-McDaniel / rpostgresql

This repository is an export of the final version from the retired Google Code system (code.google.com/p/rpostgresql).
0 stars 0 forks source link

Could not allocate memory in C function 'R_AllocStringBuffer' Error #63

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hello,
I am trying to save R objects into a PostgreSQL table by using 
rawToChar(serialize(obj,NULL,TRUE)) to save ascii representations of the 
object, and charToRaw(unserialize(obj)) to convert the ascii back into an R 
object. Unfortunately, for sufficiently large objects, I am getting the error 
"Could not allocate memory (2246 Mb) in C function 'R_AllocStringBuffer'". This 
error seems incorrect to me as I am running a 64-bit version of R and 
PostgreSQL on a server with 256GB of (mostly free) RAM. There should not be any 
difficulty allocating memory. Moreover, I only encounter the error if I call 
dbWriteTable from within another function. Calling it directly inside a script 
works fine. This leads me to believe there is perhaps a subtle bug here. I am 
running RPostgreSQL 0.4 on R 3.0.3 / Linux x86_64. Please let me know if there 
is a problem on my end or a better way for me to do this.

Here is a simple script to reproduce the error:

## This function creates a large object, serializes it, and saves it to the
## table '_test' with an associated unique 'id' column.
saveTest <- function(id) {
    ## Make a large object
    obj <- list(1:3e7, sample(LETTERS, 3e7, replace=TRUE), a~b)
    txt <- rawToChar(serialize(obj, NULL, TRUE))
    x <- data.frame(id=id, txt=txt)
    con <- dbConnect(dbDriver("PostgreSQL"), host=Sys.getenv("PGHOST"),
                     user=Sys.getenv("PGUSER"),
                     password=Sys.getenv("PGPWD"),
                     dbname="template1")
    dbSendQuery(con, "SET CLIENT_ENCODING TO 'UTF8';")
    dbWriteTable(con, "_test", x, append=TRUE, row.names=FALSE)
    dbDisconnect(con)
}

## Create the _test table
> con <- dbConnect(dbDriver("PostgreSQL"), host=Sys.getenv("PGHOST"),
                 user=Sys.getenv("PGUSER"),
                 password=Sys.getenv("PGPWD"),
                 dbname="template1")
> dbSendQuery(con, "CREATE TABLE _test (id INTEGER PRIMARY KEY, txt TEXT);")
> dbDisconnect(con)

> saveTest(1) # Wait ~30 seconds
Error in postgresqlCopyInDataframe(new.con, value) (from 
20140509.production.R@32#10) : 
  could not allocate memory (2246 Mb) in C function 'R_AllocStringBuffer'

> R.version
               _                           
platform       x86_64-pc-linux-gnu         
arch           x86_64                      
os             linux-gnu                   
system         x86_64, linux-gnu           
status                                     
major          3                           
minor          0.3                         
year           2014                        
month          03                          
day            06                          
svn rev        65126                       
language       R                           
version.string R version 3.0.3 (2014-03-06)
nickname       Warm Puppy                  

Thanks,
Robert McGehee

Original issue reported on code.google.com by rmcge...@gmail.com on 16 Jul 2014 at 6:31

GoogleCodeExporter commented 9 years ago
I just checked out the most recent version of the RPostgreSQL code from svn and 
confirmed that the behavior is the same in the devel version of the code as 
well (0.5-1 30-Jan-2014).

Original comment by rmcge...@gmail.com on 16 Jul 2014 at 7:01

GoogleCodeExporter commented 9 years ago
Ok, I now suspect that this may not be a bug, but caused by strangeness (or my 
lack of understanding) in how R environments work. Read on if you want details, 
but no action is required.

It seems that inside of a function, R sometimes will return both the object and 
a pointer to the calling environment (the function). However, in the global 
environment, the environment is not returned. If the calling environment has a 
lot of other objects in it, I suspect that they may get copied as well when 
serialize is run on an object inside a function, causing the serialized version 
to be (much?) bigger than it would otherwise be, and thus causing the memory 
error.

Notice the different behavior of returning list(a~b) either in the global 
environment or inside a function. In the second case, the calling environment 
is also returned (as a pointer, I believe). I suspect that this causes objects 
in the calling environment to be copied into the serialize command inside a 
function, but not in the global environment.

> list(a~b)
[[1]]
a ~ b

> test <- function() list(a~b)
> test()
[[1]]
a ~ b
<environment: 0x1a788f80>

## Note that the character representation is about 10% larger when the object 
is 
## returned inside a function.
> nchar(rawToChar(serialize(list(a~b), NULL, TRUE))
[1] 183
> ncharTest <- function() nchar(rawToChar(serialize(list(a~b), NULL, TRUE))
> ncharTest()
[1] 199

Thus, a possible explanation for the "bug" is that serialize is returning both 
the large object plus a large environment of other objects that is legitimately 
larger than my available contiguous memory or larger than some SQL 
configuration parameter. Thus, I withdraw the previous report.

Thanks, Robert

Original comment by rmcge...@gmail.com on 16 Jul 2014 at 10:30