Nike-Inc / brickflow

Pythonic Programming Framework to orchestrate jobs in Databricks Workflow
https://engineering.nike.com/brickflow/
Apache License 2.0
183 stars 39 forks source link

[BUG] Pydantic validation error for ImportBlock - Input should be a valid string - Got int #74

Closed menathan closed 8 months ago

menathan commented 8 months ago

Describe the bug In case a Databricks workflow already exists, and again brickflow projects deploy is used to deploy the same brickflow project, following error can be observed from the command line:

/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:128: UserWarning: Field "model_name" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:128: UserWarning: Field "model_version" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:128: UserWarning: Field "model_serving_endpoints" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
[2023-12-28 16:19:04,964] [INFO] [brickflow-framework-0.11.0] {configure.py:callback:125} - Setting env var: BRICKFLOW_ENV to local...
[2023-12-28 16:19:04,964] [INFO] [brickflow-framework-0.11.0] {configure.py:callback:125} - Setting env var: DATABRICKS_CONFIG_PROFILE to emea...
Project (***_single_project_demo, ***_multi_project_demo_delta, ***_multi_project_demo_nsp): ***_single_project_demo
[2023-12-28 16:19:14,603] [INFO] [brickflow-framework-0.11.0] {__init__.py:__setattr__:159} - Configuring attr: BRICKFLOW_AUTO_ADD_LIBRARIES with value: True
[2023-12-28 16:19:14,604] [INFO] [brickflow-framework-0.11.0] {projects.py:use_project:296} - Changed to directory: /Users/***/SourceCode/***/***-data-product-template/products/***
[2023-12-28 16:19:14,604] [INFO] [brickflow-framework-0.11.0] {__init__.py:__setattr__:159} - Configuring attr: BRICKFLOW_PROJECT_RUNTIME_VERSION with value: 0.11.0
[2023-12-28 16:19:14,604] [INFO] [brickflow-framework-0.11.0] {__init__.py:__setattr__:159} - Configuring attr: BRICKFLOW_ENABLE_PLUGINS with value: False
[2023-12-28 16:19:14,604] [INFO] [brickflow-framework-0.11.0] {__init__.py:__setattr__:159} - Configuring attr: BRICKFLOW_PROJECT_NAME with value: ***_single_project_demo
[2023-12-28 16:19:14,604] [INFO] [brickflow-framework-0.11.0] {__init__.py:__setattr__:159} - Configuring attr: BRICKFLOW_MONOREPO_PATH_TO_BUNDLE_ROOT with value: products/***
[2023-12-28 16:19:18,605] [INFO] [brickflow-framework-0.11.0] {bundles.py:download_and_unzip_databricks_cli:200} - File 'CHANGELOG.md' downloaded and saved in .databricks/bin/cli/0.210.2 directory.
[2023-12-28 16:19:18,605] [INFO] [brickflow-framework-0.11.0] {bundles.py:download_and_unzip_databricks_cli:200} - File 'LICENSE' downloaded and saved in .databricks/bin/cli/0.210.2 directory.
[2023-12-28 16:19:18,605] [INFO] [brickflow-framework-0.11.0] {bundles.py:download_and_unzip_databricks_cli:200} - File 'README.md' downloaded and saved in .databricks/bin/cli/0.210.2 directory.
[2023-12-28 16:19:18,650] [INFO] [brickflow-framework-0.11.0] {bundles.py:download_and_unzip_databricks_cli:200} - File 'databricks' downloaded and saved in .databricks/bin/cli/0.210.2 directory.
[2023-12-28 16:19:18,650] [INFO] [brickflow-framework-0.11.0] {bundles.py:download_and_unzip_databricks_cli:209} - File 'databricks' set as executable.
[2023-12-28 16:19:18,653] [INFO] [brickflow-framework-0.11.0] {commands.py:exec_command:25} - Executing command: /Users/***/SourceCode/***/***-data-product-template/products/***/.databricks/bin/cli/0.210.2/databricks --version
[2023-12-28 16:19:19,248] [INFO] [brickflow-framework-0.11.0] {configure.py:log_important_versions:144} - Using bundle version: Databricks CLI v0.210.2
[2023-12-28 16:19:19,248] [INFO] [brickflow-framework-0.11.0] {commands.py:exec_command:25} - Executing command: /Users/***/SourceCode/***/***-data-product-template/products/***/.venv/bin/python --version
[2023-12-28 16:19:19,259] [INFO] [brickflow-framework-0.11.0] {configure.py:log_python_version:152} - Using python version: Python 3.10.12
[2023-12-28 16:19:19,259] [INFO] [brickflow-framework-0.11.0] {bundles.py:bundle_synth:221} - Synthesizing bundle...
[2023-12-28 16:19:19,259] [INFO] [brickflow-framework-0.11.0] {commands.py:exec_command:25} - Executing command: /Users/***/SourceCode/***/***-data-product-template/products/***/.venv/bin/python workflows/entrypoint.py
/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:128: UserWarning: Field "model_name" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:128: UserWarning: Field "model_version" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/pydantic/_internal/_fields.py:128: UserWarning: Field "model_serving_endpoints" has conflict with protected namespace "model_".

You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.
  warnings.warn(
[2023-12-28 16:19:20,760] [INFO] [brickflow-framework-0.11.0] {__init__.py:__getattr__:147} - Getting attr: BRICKFLOW_PROJECT_NAME which has value: ***_single_project_demo
[2023-12-28 16:19:20,760] [INFO] [brickflow-framework-0.11.0] {__init__.py:__getattr__:147} - Getting attr: BRICKFLOW_AUTO_ADD_LIBRARIES which has value: true
[2023-12-28 16:19:20,760] [INFO] [brickflow-framework-0.11.0] {project.py:__post_init__:189} - Auto adding brickflow libraries...
[2023-12-28 16:19:20,760] [INFO] [brickflow-framework-0.11.0] {__init__.py:__getattr__:147} - Getting attr: BRICKFLOW_PROJECT_RUNTIME_VERSION which has value: 0.11.0
[2023-12-28 16:19:20,761] [INFO] [brickflow-framework-0.11.0] {__init__.py:__getattr__:147} - Getting attr: BRICKFLOW_ENABLE_PLUGINS which has value: false
[2023-12-28 16:19:20,822] [INFO] [brickflow-framework-0.11.0] {project.py:__exit__:289} - Deploying changes... to local
[2023-12-28 16:19:21,637] [INFO] [brickflow-framework-0.11.0] {databricks_bundle.py:belongs_to_current_project:171} - Checking if resource SupportedResolverTypes.JOB: ***_***_single_project_demo_wf belongs to current project: ***_single_project_demo; handle project validation mode is True, and the resource belongs to project: True
Traceback (most recent call last):
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/workflows/entrypoint.py", line 29, in <module>
    main()
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/workflows/entrypoint.py", line 10, in main
    with Project(
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/brickflow/engine/project.py", line 300, in __exit__
    codegen.synth()
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/brickflow/codegen/databricks_bundle.py", line 658, in synth
    bundle = self.proj_to_bundle()
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/brickflow/codegen/databricks_bundle.py", line 604, in proj_to_bundle
    DatabricksBundleResourceTransformer(resources, self).transform(self.mutators)
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/brickflow/codegen/databricks_bundle.py", line 343, in transform
    self.resource = mutator.mutate_resource(self.resource, self.ci)
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/brickflow/codegen/databricks_bundle.py", line 329, in mutate_resource
    for import_ in self._imports_iter(resource):
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/brickflow/codegen/databricks_bundle.py", line 317, in _imports_iter
    resolved_ref = self.import_resolver_chain.resolve(unresolved_ref)
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/brickflow/codegen/databricks_bundle.py", line 285, in resolve
    return resolver.resolve(ref)
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/brickflow/codegen/databricks_bundle.py", line 217, in resolve
    import_blocks = self._resolve(ref)
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/brickflow/codegen/databricks_bundle.py", line 243, in _resolve
    blocks.append(ImportBlock(to=ref.reference, id_=job.job_id))
  File "/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/lib/python3.10/site-packages/pydantic/main.py", line 164, in __init__
    __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)
pydantic_core._pydantic_core.ValidationError: 1 validation error for ImportBlock
id_
  Input should be a valid string [type=string_type, input_value=895366991787092, input_type=int]
    For further information visit https://errors.pydantic.dev/2.4/v/string_type
Error: Command '['/Users/***/SourceCode/***/***-data-product-template/products/***/.venv/bin/python', 'workflows/entrypoint.py']' returned non-zero exit status 1.
Updated 1 path from the index
Updated 1 path from the index

To Reproduce

  1. Deploy once using brickflow projects deploy
  2. Deploy a second time using brickflow projects deploy
  3. Observe the error on local machine's terminal

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Cloud Information N/A

Desktop (please complete the following information):

Additional context Potential fix by change "str" to "int" at https://github.com/Nike-Inc/brickflow/blob/85ee1cdd87f3edaa9147da72eac7ca4971d97ba0/brickflow/codegen/databricks_bundle.py#L147C5-L147C13