70TB exported in around 9 hours and reset of data scanned <10k hence, job runs longer.
have also tried increasing yarn map memory and reduce the node to increase RCU per maps however, it is a trail and error method which takes time and increase emr cost
Solution :
It can be mitigated if rcu usage refreshed based on running container with certain interval as only few container runs at end of job for long time and rcu is assigned at start of the job.
Issue: Dynamodb export job is running for more than 5 days which causes datapipeline time out due to data skew.
configuration , r5.24xlarge =20 RCU =400k size= ~80Tb maps=2000 maps
70TB exported in around 9 hours and reset of data scanned <10k hence, job runs longer.
have also tried increasing yarn map memory and reduce the node to increase RCU per maps however, it is a trail and error method which takes time and increase emr cost
Solution : It can be mitigated if rcu usage refreshed based on running container with certain interval as only few container runs at end of job for long time and rcu is assigned at start of the job.
Any other suggestion ?