Azure-Samples / modern-data-warehouse-dataops

DataOps for Microsoft Data Platform technologies. https://aka.ms/dataops-repo
MIT License
588 stars 459 forks source link

Spike: Can we define Spark Pool as "code" / config similar to how databricks does it: #368

Closed devlace closed 3 years ago

devlace commented 3 years ago

https://github.com/Azure-Samples/modern-data-warehouse-dataops/blob/main/e2e_samples/parking_sensors/databricks/config/cluster.config.json

Assigned to Anuj

promisinganuj commented 3 years ago

The short answer is yes. Essentially, the following resources can be deployed as a code:

$ az deployment group create --name SparkPoolDeployment --resource-group rg-learning-synapse --template-file ./sparkpool.bicep --parameters @sparkpool.parameters.json

An existing spark pool can also be updated, especially to include the external package dependencies using the requirement.txt/environment.yml file as shown below:

$ az synapse spark pool update --name SparkPoolNew --workspace-name ws-learning-synapse --resource-group rg-learning-synapse --library-requirements environment.yml

Command group 'synapse' is in preview and under development. Reference and support levels: https://aka.ms/CLI_refstatus { "autoPause": { "delayInMinutes": 15, "enabled": true }, "autoScale": { "enabled": true, "maxNodeCount": 10, "minNodeCount": 3 }, "cacheSize": 0, "creationDate": "2021-07-05T01:05:44.843333+00:00", "customLibraries": null, "defaultSparkLogFolder": null, "dynamicExecutorAllocation": { "enabled": false }, "id": "/subscriptions/xxxxxxx/resourceGroups/rg-learning-synapse/providers/Microsoft.Synapse/workspaces/ws-learning-synapse/bigDataPools/SparkPoolNew", "isComputeIsolationEnabled": false, "lastSucceededTimestamp": "2021-07-06T03:49:19.320000+00:00", "libraryRequirements": { "content": "name: stats2\nchannels:\n- defaults\ndependencies:\n- bokeh\n- numpy\n- pip:\n - matplotlib\n - koalas==1.7.0", "filename": "environment.yml", "time": "2021-07-06T03:49:17.336629+00:00" }, "location": "australiaeast", "name": "SparkPoolNew", "nodeCount": 10, "nodeSize": "Small", "nodeSizeFamily": "MemoryOptimized", "provisioningState": "Succeeded", "resourceGroup": "rg-learning-synapse", "sessionLevelPackagesEnabled": true, "sparkConfigProperties": null, "sparkEventsFolder": null, "sparkVersion": "2.4", "tags": {}, "type": "Microsoft.Synapse/workspaces/bigDataPools" }`