Open willcorb opened 3 months ago
Hi @willcorb, thank you for filing this issue.
I'm following up with the backend team responsible for this service. The Terraform provider can be extended to wait for these tables to be created on quality monitor creation and to delete these tables on quality monitor deletion. However, I want to confirm whether this should be done directly in creation/deletion via the API. Otherwise, any API users who try to create/delete quality monitors will have the same issues.
That error message is also not great. analysis_config
is not part of the REST API directly. I can see that this corresponds to one of inference_log
, time_series
, generic
, or rag_studio
. Can you share the template you're using to create the inference_log
quality monitor? I will also ask the API team to return a more usable error message.
Hi @mgyucht at the moment I've implemented a variable sleep variable to the terraform which can allow time for the tables to be created and then apply grants, although this is not idea because we are guessing how long it will take.
I've also implemented destroys for the two tables and dashboard using local-exec api calls. This appears to work, but again is not ideal. The monitor should really destroy the child artefacts.
Here is the terraform we use for the quality monitor, where we allow the type to be selected and populated accordingly.
resource "databricks_quality_monitor" "quality_monitor" {
provider = databricks.workspace
table_name = "${var.catalog_name}.${var.schema_name}.${var.table_name}"
assets_dir = "/Shared/${var.domain}/${var.schema_name}/${var.assets_dir}"
output_schema_name = "${var.output_catalog}.${var.schema_name}"
warehouse_id = data.databricks_sql_warehouse.sql_warehouse_type.id
schedule {
quartz_cron_expression = var.cron_expression
timezone_id = "UTC"
}
dynamic "time_series" {
for_each = var.monitor_type == "time_series" ? [1] : []
content {
granularities = var.time_series_config.granularities
timestamp_col = var.time_series_config.timestamp_col
}
}
dynamic "inference_log" {
for_each = var.monitor_type == "inference_log" ? [1] : []
content {
granularities = local.inference_log_config.granularities
timestamp_col = local.inference_log_config.timestamp_col
model_id_col = local.inference_log_base_config.model_id_col
prediction_col = local.inference_log_base_config.prediction_col
problem_type = local.inference_log_base_config.problem_type
# Optional fields
label_col = lookup(local.inference_log_optional_config, "label_col", null)
prediction_proba_col = lookup(local.inference_log_optional_config, "prediction_proba_col", null)
}
}
dynamic "snapshot" {
for_each = var.monitor_type == "snapshot" ? [1] : []
content {}
}
}
resource "databricks_permissions" "quality_monitor_dashboard_usage" {
provider = databricks.workspace
count = length(var.read_groups) == 0 ? 0 : 1
dashboard_id = databricks_quality_monitor.quality_monitor.dashboard_id
depends_on = [databricks_quality_monitor.quality_monitor]
dynamic "access_control" {
for_each = toset(var.read_groups)
content {
group_name = access_control.value
permission_level = "CAN_RUN"
}
}
}
I am facing the same problem just calling update()
on the Databricks Python SDK for a quality monitor.
This happens when I try to update the schedule
field.
@adamwrobel-ext-gd Can you try something like this ( load additional parameters from get() output):
`monitor = client.quality_monitors.get(table_name=table_name)
client.quality_monitors.update( table_name=table_name, output_schema_name=monitor.output_schema_name, snapshot=monitor.snapshot, time_series=monitor.time_series, data_classification_config=monitor.data_classification_config, schedule=None # Set schedule to None to stop the table monitor )`
Configuration
Expected Behavior
The example above is from the documentation. It makes no difference if it's our code of that in the example. We would expect the terraform to create the two metric tables and dashboard before the 'terraform' apply comes back as completed. We would also expect the terraform to destroy all the resources including the tables and dashboard after a destroy is requested.
We would also expect all the parameters to be documented in the documentation for the latest provider (1.49.1 at this time)
Actual Behavior
the 'terraform apply' confirms success for the monitor but it does not wait for the dashboard or 2 metric tables to be created. This means we cannot use a 'depends_on' to apply permissions to the two tables (since they are deployed through a SP token).
It takes anywhere between 8 to 20 minutes for the tables to be created (after the terraform reports success on apply).
'terraform destroy' does not remove the metric tables or dashboard only the monitor so these resources are orphaned. So we have to manually remove these assets and cannot automate properly.
When trying to create a inference_log quality monitor we get: 'Error: cannot create quality monitor: The
analysis_config
field must be provided.' analysis_config is not documented so we don't know what it references?Steps to Reproduce