Stratio / Spark-MongoDB

Spark library for easy MongoDB access
http://www.stratio.com
Apache License 2.0
307 stars 96 forks source link

How to increase the performance. #148

Open swarooppallapothu opened 8 years ago

swarooppallapothu commented 8 years ago

I have 1 Billion rows(50GB for 1 column) in the RDBMS. I am doing analysis on that data using spark dataframes and persisting into mongo db Here output dataframe may extend 5 Times. -> As i said 1 column has 50GB data after analysis it may extend 5*50=250 GB After analysis i am persisting data with time of 10Hrs.

Please provide steps for better performance. I need to save that dataframe less than 1Hr.

Thank you.