When testing the connector yesterday (great job guys!) I noticed that depending on the collection size in MongoDB the task list can get quite big.
For example, a simple view task
val mongoRDD = spark.sqlContext.fromMongoDB(readConfig)
mongoRDD.createTempView("testCOLL")
val dataFrame = spark.sql("SELECT key FROM testCOLL LIMIT 10")
dataFrame.show()
when run on IntelliJ IDE generates a request of 486 tasks for Spark. Way too slow. Has anyone seen anything similar?
When testing the connector yesterday (great job guys!) I noticed that depending on the collection size in MongoDB the task list can get quite big.
For example, a simple view task val mongoRDD = spark.sqlContext.fromMongoDB(readConfig) mongoRDD.createTempView("testCOLL") val dataFrame = spark.sql("SELECT key FROM testCOLL LIMIT 10") dataFrame.show()
when run on IntelliJ IDE generates a request of 486 tasks for Spark. Way too slow. Has anyone seen anything similar?