MI-DPLA / combine

Combine /kämˌbīn/ - Metadata Aggregator Platform
MIT License
26 stars 11 forks source link

add limited Livy/Spark tuning from Combine localsettings #298

Closed ghukill closed 5 years ago

ghukill commented 5 years ago

Now that Spark cluster is utilized, confirmed that some overrides for spark application parameters can be passed while init-ing Livy session. e.g.

'driverMemory':'4g',
'driverCores':4,
'executorMemory':'4g',
'executorCores':4,
'numExecutors':4

Probably makes sense to have these set automatically during build, with instructions on how to tune in documentation. Understood that a) this is a somewhat unusual Spark deploy, with only a single instance (driver and executor on same machine), and b) that Spark tuning is extremely tied to the kind of work to be done. But, these can provide some limited configurations for users.

Or, have these overrides commented out by default, and continue with very conservative 2gb RAM defaults.

ghukill commented 5 years ago

Done - adding possible configs for large and small servers, but leaving absence of settings as default, picking up defaults from Spark's spark-defaults.conf file on a new build.

Thinking here, the tunings are complex, and highly tailored to a server. These newly added settings provide a nice way to demonstrate settings for a small server (where even the defaults are too high), but not much more.