Open Mimetis opened 3 years ago
I spent some time looking into the manual deployment of Synapse in this context. A few things I found out that might be useful regarding the notebooks that are used in Databricks (main.ipynb and common.ipynb) that need transferring to Synapse:
In Databricks/common.ipynb you are getting the secret for the service principal:
client_secret = dbutils.secrets.get(keyvault, "clientsecret")
In Synapse, you need to add your Keyvault als a linked service, afterwards in Synapse/common.ipynb you can do the same by:
client_secret = TokenLibrary.getSecret("kvengzxq4fl", "clientsecret")
The following lines in Databricks/common.ipynb should be obsolete in Synapse/common.ipynb because in the Synapse case you want to use the ADLS that is Synapses workspace default storage.
accountName = engine["storageName"] # from engine.storageName
accountKey = "dsLake-" + engine["storageName"] # from engine.storageName
# Get the secret value
accountKeyValue = dbutils.secrets.get(keyvault, accountKey)
# set the token for accessing input and output path
spark.conf.set("fs.azure.account.key." + accountName + ".dfs.core.windows.net", accountKeyValue)
In Databricks/main.ipynb you run the common notebook with:
%run "./common"
According to documentation (https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-development-using-notebooks?tabs=preview#notebook-reference) in Synapse you should be able to use %run as well, however the documentation also gives:
which leads to believe this will not work when using it in a pipeline (which is desired).
Running a notebook from another notebook in Synapse does work when you use:
mssparkutils.notebook.run("common")
and to check it did run you could use in Synapse/main.ipynb:
exitVal = mssparkutils.notebook.run("common")
print (exitVal)
when adding something like:
mssparkutils.notebook.exit("Execution of common notebook is finished")
to the last cell in Synapse/common.ipynb you can see that the notebook is executed. However, the functions exposed in Synapse/common.ipynb are not available from Synapse/main.ipynb. So it seems we don't get the context from that notebook back.
Proposed workaround:
Idea
Adding the option to deploy an engine, using Synapse instead of Databricks / ADF
Today
For now, we only have the option to deploy an engine using Databricks:
Expectation
Having the same level of integration than Databricks, but using Synapse.