Stratio / Spark-MongoDB

Spark library for easy MongoDB access
http://www.stratio.com
Apache License 2.0
307 stars 99 forks source link

Can I configure the task calculator function for the connector? #160

Open alexDeCastroAtGit opened 8 years ago

alexDeCastroAtGit commented 8 years ago

When testing the connector yesterday (great job guys!) I noticed that depending on the collection size in MongoDB the task list can get quite big.

For example, a simple view task val mongoRDD = spark.sqlContext.fromMongoDB(readConfig) mongoRDD.createTempView("testCOLL") val dataFrame = spark.sql("SELECT key FROM testCOLL LIMIT 10") dataFrame.show()

when run on IntelliJ IDE generates a request of 486 tasks for Spark. Way too slow. Has anyone seen anything similar?