Open Ahmed-Wagdy opened 8 years ago
That's the correct behaviour, and has to do with Spark not SparkMongo. You are saving it to a dataframe val df = sqlc.read.json(events)
, which must have a valid schema. In this case "Dev" is forced to be an array with schema a:Long and b:Long, so Spark fills in the missing fields with null.
Run df.show
to see what I mean
Trying to save a DataFrame into MongoDB :
val event = """{"Dev":[{"a":3},{"b":3}],"hr":[{"a":6}]}"""
val events = sc.parallelize(event :: Nil)
val df = sqlc.read.json(events)
val saveConfig = MongodbConfigBuilder(Map(Host -> List("localhost:27017"), Database -> "test", Collection -> "test", SamplingRatio -> 1.0, WriteConcern -> "normal", SplitSize -> 8, SplitKey -> "_id"))
df.saveToMongodb(saveConfig.build)
and that is what actually saved :
{ "_id" : ObjectId("57cedf4bd244c56e8e783a45"), "Dev" : [ { "a" : NumberLong(3), "b" : null }, { "a" : null, "b" : NumberLong(3) } ], "hr" : [ { "a" : NumberLong(6) } ] }