databricks / terraform-provider-databricks

Databricks Terraform Provider
https://registry.terraform.io/providers/databricks/databricks/latest
Other
446 stars 385 forks source link

[ISSUE] Issue with databricks_model_serving resource #3676

Closed nsenno-dbr closed 2 months ago

nsenno-dbr commented 3 months ago

Configuration

terraform {
    required_providers {
        databricks = {
            source = "databricks/databricks"
        }
    }
}

provider "databricks" {
    profile = "default"
}

resource "databricks_model_serving" "this" {
    name = "nsenno-terraform-test"
    config {
      served_entities {
        external_model {
          name = "gpt-35-turbo"
          provider = "openai"
          task = "llm/v1/chat"
          openai_config {
            openai_api_base = "https://dbdemos-open-ai.openai.azure.com/"
            openai_api_key = "{{secrets/dbdemos/azure-openai}}"
            openai_api_type = "azure"
            openai_api_version = "2023-05-15"
            openai_deployment_name = "dbdemo-gpt35"
          }
        }
      }
    }

}

Expected Behavior

The external model should be provisioned in the Databricks workspace and be successfully queried

Actual Behavior

Received an error that provisioned throughput is not supported for endpoints with external models

image

Steps to Reproduce

Setup secrets and deployment for external model (in my example I have Azure OpenAI) terraform apply

Terraform and provider versions

Latest databricks terraform provider

terraform version v1.8.5 on darwin_arm64

Is it a regression?

Yes this is a regression. I was able to deploy a model and query it using the 1.40.0 version of the terraform provider

Important Factoids

No

Would you like to implement a fix?

No

arpitjasa-db commented 3 months ago

@nsenno-dbr with the 1.40.0 version of the provider, can you do multiple deploys or is that flow broken for you? Does the deploy complete or does the Terraform deploy keep running but the resource has already been created?

nsenno-dbr commented 3 months ago

@arpitjasa-db yes Terraform hangs with waiting to deploy while the resource itself is queryable in the workspace. I attribute that to the API itself because I had the same issue when using the Databricks SDK create_and_wait https://databricks-sdk-py.readthedocs.io/en/latest/workspace/serving/serving_endpoints.html#databricks.sdk.service.serving.ServingEndpointsAPI.create_and_wait.

It might be worth checking in with that API team to see if there is an issue with the signal that is generated when the model is created

arpitjasa-db commented 3 months ago

@nsenno-dbr got it, working with them to fix this. Can you try using an earlier version of the SDK and see if that fixes the issue?

edwardfeng-db commented 3 months ago

@arpitjasa-db I think for this issue, the fix should be that we don't add MinProvisionedThroughput to forceSendFields on this line, if external model is given. Can you help make this change?

nsenno-dbr commented 3 months ago

@arpitjasa-db I'm using Databricks Asset Bundles so the Terraform version is bundled with the CLI version. It'll take some time for me to validate

arpitjasa-db commented 3 months ago

@edwardfeng-db I already added that in the PR and tested with/without it, but that doesn't seem to be related to the issue since MinProvisionedThroughput is not a field used by ExternalModels.

because I had the same issue when using the Databricks SDK create_and_wait

@nsenno-dbr I was referring to the Databricks SDK, rather than Terraform/CLI, that way we know we need to fix this issue one level deeper

arpitjasa-db commented 2 months ago

@nsenno-dbr can you try testing now?