I'm using the following code to get data using a split key on pid. But unfortunately, spark is fetching all the data from MongoDB collection.Can someone say me where I'm going wrong.
val mongoConfig = MongodbConfigBuilder(
Map(
Host -> mongoHost,
Database -> mongoDatabase,
Collection -> mongoCollection,
SamplingRatio -> 1.0,
WriteConcern -> "normal",
SplitSize -> "5",
SplitKey -> "pid",
SplitKeyMin -> "357",
SplitKeyMax -> "368",
SplitKeyType -> "int"
)
).build()
val df = spark.sqlContext.fromMongoDB(mongoConfig)
I found the reason for this problem,there is no index on "pid" field in my mongoCollection.I would be very helpful if this is included in the documentation.
I'm using the following code to get data using a split key on pid. But unfortunately, spark is fetching all the data from MongoDB collection.Can someone say me where I'm going wrong. val mongoConfig = MongodbConfigBuilder( Map( Host -> mongoHost, Database -> mongoDatabase, Collection -> mongoCollection, SamplingRatio -> 1.0, WriteConcern -> "normal", SplitSize -> "5", SplitKey -> "pid", SplitKeyMin -> "357", SplitKeyMax -> "368", SplitKeyType -> "int" ) ).build() val df = spark.sqlContext.fromMongoDB(mongoConfig)