dselivanov / rmongodb

R driver for MongoDB
53 stars 26 forks source link

mongo.insert.batch fails in introduction.Rmd & script suggestion to speed up #32

Closed gregorbj closed 10 years ago

gregorbj commented 10 years ago

This issue concerns the introduction.Rmd script in the vignettes directory.

The zips dataset has several documents with identical _id values. The mongo.insert.batch(mongo, "rmongodb.zips", res) statement fails with an error message in the server window regarding a duplicate _id. I got is to run by inserting the following line:

myzips <- zips[ !duplicated( zips[,"_id"]), ]

and substituting myzips for zips.

Also, using a for loop is a relatively slow way to create all the bson values stored in the variable res. It takes 5.55 seconds on my laptop (including the mongo.insert.batch function call). The process can be sped up by first using an apply to convert the list matrix into a list:

myziplist <- list()
myziplist <- apply( myzips, 1, function(x) c( myziplist, x ) )

and then using lapply to create res

res <- lapply( myziplist, function(x) mongo.bson.from.list(x) )

This takes 1.28 seconds on my laptop (including the mongo.insert.batch function call).

One more small point. It is unnecessary to check for the MongoDB connection using

if(mongo.is.connected(mongo) == TRUE)

since mongo.is.connected(mongo) returns the value TRUE if there is a connection

if(mongo.is.connected(mongo))

is sufficient.

Thanks much for working on this library. I think it may offer a database solution for an R application I'm developing. I'll be testing the potential over the next few weeks.

schmidb commented 10 years ago

Thanks a lot for the feedback. I fixed all the issues and they are online in version 1.6.2 on github.