I have 1 Billion rows(50GB for 1 column) in the RDBMS. I am doing analysis on that data using spark dataframes and persisting into mongo db Here output dataframe may extend 5 Times.
-> As i said 1 column has 50GB data after analysis it may extend 5*50=250 GB
After analysis i am persisting data with time of 10Hrs.
Please provide steps for better performance. I need to save that dataframe less than 1Hr.
I have 1 Billion rows(50GB for 1 column) in the RDBMS. I am doing analysis on that data using spark dataframes and persisting into mongo db Here output dataframe may extend 5 Times. -> As i said 1 column has 50GB data after analysis it may extend 5*50=250 GB After analysis i am persisting data with time of 10Hrs.
Please provide steps for better performance. I need to save that dataframe less than 1Hr.
Thank you.