Closed ajayram198 closed 9 years ago
Hi, Ajay.
Of cource when you construct data.frame
and you have diffrent types in the same field mongo.find.all
coerces them to the highest type. This is correct behaviour and mongo.find.all
do the best it can.
This is definetely not a rmongodb bug. This a problem of how you store your data. MongoDB provide great flexibility so you can keep diffrent objects (with diffrent types!) in field with same name. But this flexibility also require correct data handling. So if you keep data in such messy way you should:
list
using mongo.cursor.to.list
"null"
s to "NAreal" (for example using lapply
)data.frame
from list
manually.please see source code for mongo.find.all
When we import data in R from MongoDB using mongo.find.all or mongo.find.batch function of rmongodb package, the original data types of fields defined in MongoDB are not preserved because fields take character value in one or more documents. (e.g. "null"). After importing MongoDB collection containing such fields and converting it into R data frames, it will consider such fields as character variables instead of original data types from MongoDB. To preserve this original data types, we have to first replace "null" values by NA's. How to replace these "null" values by NA's to preserve the original data types from MongoDB while importing MongoDB collection itself.
This typed feature is already available in R when we import data from CSV files. We just need to use na.strings = "null" argument in read.csv function for this purpose as follows.
Though the particular variables containing "null" values in Excel sheet, after using above function it replaces null values by NA values and considers its appropriate data types.
For illustration of this problem, we will consider a sample collection with 5 fields which takes null values in one or more documents. Lets import data in R using mongo.find.all function and convert it into R data frame. Following is the screenshot for R data frame of sample collection.
Now if we observe the values for this data frame, it seems that all columns have numeric datatype. and take null value at first document. But if we check the classes of individual fields, it shows character datatype, though its datatype was defined as Double in MongoDB. Following is the screenshot for the same:
Ideally, this field should have been numeric after importing. So we see that when the field has null values, the original data types of the fields have been lost and it gets converted to character data type everywhere.
Anticipating for early response.