Closed syed-tw closed 1 year ago
should be fixed - had to change the default machines too. @syed-tw can you check if this suits your needs?
We had to do a few things to get this to work.
py4j.security.Py4JSecurityException: Method public scala.collection.immutable.Map com.databricks.backend.common.rpc.CommandContext.tags() is not whitelisted on class class com.databricks.backend.common.rpc.CommandContext
... which is apparently typical for "high-throughput" or shared clusters (as a safety precaution). After a lot of poking around and trying to set `spark.databricks.pyspark.enablePy4JSecurity` to false (didn't work, static value, not changeable) as an override, we decided to stick with the "Single User" cluster.
As a result, we have created a new SINGLE USER policy just for this exercise and the trainers have made it clear that they were ok to facilitate the switching between the two clusters. New documentation will be created to list the settings required for this exercise.
2. The ML runtime requirement is because of some global init script which installs a bunch of, what we consider to be unnecessary ML libraries. Commented those out, and we can now re-use our 11.3 LTS.
Here it the cluster configs which I used to run the Delta Lake Optimazations Notebook