databricks / mlops-stacks

This repo provides a customizable stack for starting new ML projects on Databricks that follow production best-practices out of the box.
https://docs.databricks.com/en/dev-tools/bundles/mlops-stacks.html
Apache License 2.0
456 stars 156 forks source link

Failed to read recipe configuration #75

Closed HimanshuSharma-QB closed 12 months ago

HimanshuSharma-QB commented 1 year ago

Getting "recipe.yaml" file not find error while triggering CI pipeline. While this file exists in repo and if we run the notebook manually in Databricks it works without any error.

Error message: MlflowException: Failed to read recipe configuration. Please verify that the recipe.yaml configuration file and the YAML configuration file for the selected profile are syntactically correct and that the specified profile provides all required values for template substitutions defined in recipe.yaml.

mingyu89 commented 1 year ago

Hi @HimanshuSharma-QB, by "CI pipeline", are your referring to "ML Code Tests for [project name]" github action? This action uses the "databricks-test.yaml" profile.
https://github.com/databricks/mlops-stack/blob/main/%7B%7Bcookiecutter.root_dir__update_if_you_intend_to_use_monorepo%7D%7D/%7B%7Bcookiecutter.project_name%7D%7D/training/profiles/databricks-test.yaml

Could you

  1. first try validating the profile exists
  2. and then manually run notebook by directly setting the profile to "databricks-test" on this line? https://github.com/databricks/mlops-stack/blob/main/%7B%7Bcookiecutter.root_dir__update_if_you_intend_to_use_monorepo%7D%7D/%7B%7Bcookiecutter.project_name%7D%7D/training/notebooks/Train.py#L63
HimanshuSharma-QB commented 1 year ago

2. databricks-test

I have tried running it manually and it works fine if i set profile to "databricks-test". But fails if i set profile to "databricks-staging". env is set to staging in the CI pipeline as it's running in staging environment.

arpitjasa-db commented 12 months ago

Hi @HimanshuSharma-QB we've actually moved away from MLflow recipes and changed the code significantly to make catching such issues easier. Would you mind trying the latest version and re-opening this issue if it still persists?