databricks-demos / dbdemos

Demos to implement your Databricks Lakehouse
Other
255 stars 80 forks source link

IoT Platform demo job task register_ml_model fails trying to display confusion matrix #29

Closed tlaur closed 1 year ago

tlaur commented 1 year ago

Running on Azure Databricks Premium I am able to import the lakehouse-iot-platform demo and related assets successfully.

The dbdemos_lakehouse_iot_turbine_init job run starts automatically as it should, but fails on the register_ml_model task with the following error FileNotFoundError: [Errno 2] No such file or directory: '/local_disk0/tmp/d5ae264a/confusion_matrix.png'

The job runs by default on a job cluster. Swapping the job to run on the automatically created all-purpose cluster fixes the problem. As an alternative workaround commenting out displaying the confusion matrix also works.

QuentinAmbard commented 1 year ago

hi, It's working well on my end. Are you sure the auto ml run was successful ? What could happen also is that you run the autoML with a cluster having a version not 12.2

Can you try to run the notebook 04.1-automl-iot-turbine-predictive-maintenance with the force refresh=True also ?

image
tlaur commented 1 year ago

Indeed, the default job cluster is version 11.3. After changing the cluster configuration to 12.2 everything worked.

Here is what I did on Azure:

  1. Create a new Azure Databricks resource choosing the premium tier and otherwise default settings.
  2. Go to Manage Account and add the newly created workspace to an existing Unity Catalog.
  3. Go to Admin Settings and Enable Serverless SQL warehouses.
  4. Open a blank Python notebook and run the following to import the demo.
    %pip install dbdemos
    import dbdemos
    dbdemos.install('lakehouse-iot-platform')
  5. Accept default prompts to create a new Personal cluster as at this point there are no clusters to run the notebook on.

The import completes successfully, and the dbdemos_lakehouse_iot_turbine_init job starts. Compute shows the job running on a cluster called "Shared_job_cluster" on 11.3 LTS ML runtime. The job fails, but as mentioned after bumping up the runtime it works as it should.

QuentinAmbard commented 1 year ago

got it - that was a typo in the cluster conf. I'm updating it, it'll be fixed in the next released later today. Thanks !