ExpediaGroup / apiary-data-lake

Terraform scripts for deploying Apiary Data Lake
https://github.com/ExpediaGroup/apiary
Apache License 2.0
19 stars 31 forks source link

Turn on performance_schema for apiary meta store #84

Open RongQiao opened 6 years ago

RongQiao commented 6 years ago

I am from Expedia Data Engineering team. We have some AWS mysql RDS served as Hive Meta store, and we suffered some metastore performance issues in the past when some users run 'alter table recover partitions' etc on big dataset. Without the performance_schema, we don't have much insights about what's going on. The apiary metastore has more complicated use cases, so I would suggest that the performance_schema is available for apiary meta store.

massdosage commented 4 years ago

@rpoluri do you understand the request here? Is this some part of the underlying Hive metastore DB schema that for some reason we haven't activated?

mroark1m commented 4 years ago

It's an extra component of mysql (seems like also in aurora https://aws.amazon.com/blogs/database/analyze-amazon-aurora-mysql-workloads-with-performance-insights/) that gives you more stats on who and what is abusing the database when you have performance problems. https://dev.mysql.com/doc/refman/8.0/en/performance-schema.html It's one of those things you don't need until the database is breaking, but I think it also has some nontrivial performance impact itself too.

massdosage commented 4 years ago

Wouldn't the owners of the RDS be able to set that up themselves then? i.e. it doesn't need to be part of Apiary? Feels more like a MySQL/Aurora DB admin task?

rpoluri commented 4 years ago

We manage RDS part of Apiary Data Lake. May be this is corresponding option in RDS, https://www.terraform.io/docs/providers/aws/r/rds_cluster_instance.html#performance_insights_enabled will check and close issue accordingly.