datacontract / datacontract-cli

CLI to manage your datacontract.yaml files
https://cli.datacontract.com
Other
475 stars 92 forks source link

Databricks - Failed to run test - unexpected keyword argument 'sqlQuery' #186

Closed letnotimitateothers closed 5 months ago

letnotimitateothers commented 6 months ago

Hi, I am getting the following error when running Databricks test on my notebooks.

Screen Shot 2024-05-10 at 10 44 23

ERROR:soda.scan:[08:35:38] Query error: databricks.dev_observatory_buildings_studies.schema[dev_observatory_buildings_studies]: _override_spark_functions.<locals>._dlt_sql_fn2() got an unexpected keyword argument 'sqlQuery' DESCRIBE dev_observatory_buildings_studies ERROR:soda.scan: | _override_spark_functions.<locals>._dlt_sql_fn2() got an unexpected keyword argument 'sqlQuery' ERROR:soda.scan:[08:35:38] Error occurred while executing scan. ERROR:soda.scan: | object of type 'NoneType' has no len() WARNING:root:Engine soda-core has errors. See the logs for details. 'warning'

Here is the version : dataContractSpecification: 0.9.3

Here are my servers config in tha Yaml file:

servers: production: type: databricks host: xxxxxxxx catalog: hive_metastore schema: xxxx_production_schema

Can you please help me ?

Thanks a lot :)

jochenchrist commented 6 months ago

Which Databricks Runtime Version are you using? If it is < 13.3 LTS, try to run it with this version.

letnotimitateothers commented 6 months ago

I'm using this version : 14.1 (includes Apache Spark 3.5.0, Scala 2.12) Thanks

jochenchrist commented 6 months ago

Sorry, I have not enough information to reproduce the issue.

This data contract works with 14.1:


from datacontract.data_contract import DataContract

data_contract = DataContract(data_contract_str="""
dataContractSpecification: 0.9.3
id: databricks-issue-186
info:
  title: databricks-issue-186
  version: 0.0.1
  owner: my-domain-team
servers:
  production:
    type: databricks
    host: xxx
    catalog: hive_metastore
    schema: default
models:
  sample_data_1:
    type: table
    fields:
      field_one:
        type: varchar
        required: true
        unique: true
      field_two:
        type: bigint
        minimum: 10
      field_three:
        type: timestamp
""",
spark=spark)

run = data_contract.test()

run.result```

Can you share your data contract YAML and the definition of your Hive table?
letnotimitateothers commented 6 months ago

Thanks a lot @jochenchrist for your answer! I've tried with your code (with a few modifications):

`from datacontract.data_contract import DataContract

from pyspark.sql import SparkSession spark = SparkSession.builder.master("local").appName("test").getOrCreate()

data_contract = DataContract(data_contract_str=""" dataContractSpecification: 0.9.3 id: databricks-issue-186 info: title: databricks-issue-186 version: 0.0.1 owner: my-domain-team servers: production: type: databricks host: adb-**.azuredatabricks.net catalog: hive_metastore schema: ***_production_schema models: production_analytics: type: table fields: study_uuid: description: "TODO" type: number

""", spark=spark)

run = data_contract.test()

run.result`

And I still has the same error.

Screen Shot 2024-05-14 at 11 52 53

I think the problem might be related to the versions of the libraries (Soda, Databricks, etc). I'm using Python 3.10. databricks-sdk 0.1.6.

letnotimitateothers commented 6 months ago

Here are the logs of the installation (I hope that helps):

` Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages. Collecting datacontract-cli Downloading datacontract_cli-0.10.3-py3-none-any.whl (94 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.7/94.7 kB 3.1 MB/s eta 0:00:00 Collecting s3fs==2024.3.1 Using cached s3fs-2024.3.1-py3-none-any.whl (29 kB) Collecting python-multipart==0.0.9 Using cached python_multipart-0.0.9-py3-none-any.whl (22 kB) Collecting snowflake-connector-python[pandas]<3.8,>=3.6 Using cached snowflake_connector_python-3.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.6 MB) Collecting pydantic<2.8.0,>=2.5.3 Using cached pydantic-2.7.1-py3-none-any.whl (409 kB) Collecting requests~=2.31.0 Using cached requests-2.31.0-py3-none-any.whl (62 kB) Collecting opentelemetry-exporter-otlp-proto-grpc~=1.16.0 Using cached opentelemetry_exporter_otlp_proto_grpc-1.16.0-py3-none-any.whl (20 kB) Collecting soda-core-spark[databricks]<3.4.0,>=3.3.1 Using cached soda_core_spark-3.3.4-py3-none-any.whl (9.8 kB) Collecting fastparquet==2024.2.0 Using cached fastparquet-2024.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB) Collecting soda-core-duckdb<3.4.0,>=3.3.1 Using cached soda_core_duckdb-3.3.4-py3-none-any.whl (7.5 kB) Collecting typer[all]<0.13,>=0.9 Using cached typer-0.12.3-py3-none-any.whl (47 kB) Collecting soda-core-snowflake<3.4.0,>=3.3.1 Using cached soda_core_snowflake-3.3.4-py3-none-any.whl (7.9 kB) Collecting botocore<1.34.70,>=1.34.41 Using cached botocore-1.34.69-py3-none-any.whl (12.0 MB) Collecting boto3<1.34.70,>=1.34.41 Downloading boto3-1.34.69-py3-none-any.whl (139 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 139.3/139.3 kB 9.3 MB/s eta 0:00:00 Collecting avro==1.11.3 Using cached avro-1.11.3-py2.py3-none-any.whl Collecting soda-core-postgres<3.4.0,>=3.3.1 Using cached soda_core_postgres-3.3.4-py3-none-any.whl (9.4 kB) Collecting fastjsonschema~=2.19.1 Using cached fastjsonschema-2.19.1-py3-none-any.whl (23 kB) Collecting pyyaml~=6.0.1 Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB) Collecting simple-ddl-parser==1.1.0 Downloading simple_ddl_parser-1.1.0-py3-none-any.whl (75 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.3/75.3 kB 8.7 MB/s eta 0:00:00 Collecting fastapi==0.110.1 Using cached fastapi-0.110.1-py3-none-any.whl (91 kB) Collecting rich~=13.7.0 Using cached rich-13.7.1-py3-none-any.whl (240 kB) Collecting python-dotenv~=1.0.0 Using cached python_dotenv-1.0.1-py3-none-any.whl (19 kB) Collecting duckdb==0.10.2 Downloading duckdb-0.10.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 43.0 MB/s eta 0:00:00 Collecting rdflib==7.0.0 Using cached rdflib-7.0.0-py3-none-any.whl (531 kB) Collecting soda-core-spark-df<3.4.0,>=3.3.1 Using cached soda_core_spark_df-3.3.4-py3-none-any.whl (7.3 kB) Collecting soda-core-bigquery<3.4.0,>=3.3.1 Using cached soda_core_bigquery-3.3.4-py3-none-any.whl (8.0 kB) Collecting opentelemetry-exporter-otlp-proto-http~=1.16.0 Using cached opentelemetry_exporter_otlp_proto_http-1.16.0-py3-none-any.whl (21 kB) Collecting deltalake~=0.17.0 Downloading deltalake-0.17.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26.5/26.5 MB 47.3 MB/s eta 0:00:00 Collecting typing-extensions>=4.8.0 Using cached typing_extensions-4.11.0-py3-none-any.whl (34 kB) Collecting starlette<0.38.0,>=0.37.2 Using cached starlette-0.37.2-py3-none-any.whl (71 kB) Requirement already satisfied: packaging in /databricks/python3/lib/python3.10/site-packages (from fastparquet==2024.2.0->datacontract-cli) (22.0) Collecting fsspec Using cached fsspec-2024.3.1-py3-none-any.whl (171 kB) Requirement already satisfied: pandas>=1.5.0 in /databricks/python3/lib/python3.10/site-packages (from fastparquet==2024.2.0->datacontract-cli) (1.5.3) Collecting cramjam>=2.3 Using cached cramjam-2.8.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB) Requirement already satisfied: numpy>=1.20.3 in /databricks/python3/lib/python3.10/site-packages (from fastparquet==2024.2.0->datacontract-cli) (1.23.5) Requirement already satisfied: pyparsing<4,>=2.1.0 in /databricks/python3/lib/python3.10/site-packages (from rdflib==7.0.0->datacontract-cli) (3.0.9) Collecting isodate<0.7.0,>=0.6.0 Using cached isodate-0.6.1-py2.py3-none-any.whl (41 kB) Collecting aiobotocore<3.0.0,>=2.5.4 Using cached aiobotocore-2.12.3-py3-none-any.whl (76 kB) Collecting aiohttp!=4.0.0a0,!=4.0.0a1 Using cached aiohttp-3.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB) Collecting ply<4.0,>=3.11 Using cached ply-3.11-py2.py3-none-any.whl (49 kB) Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /databricks/python3/lib/python3.10/site-packages (from boto3<1.34.70,>=1.34.41->datacontract-cli) (0.10.0) Collecting s3transfer<0.11.0,>=0.10.0 Downloading s3transfer-0.10.1-py3-none-any.whl (82 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 82.2/82.2 kB 6.4 MB/s eta 0:00:00 Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /databricks/python3/lib/python3.10/site-packages (from botocore<1.34.70,>=1.34.41->datacontract-cli) (1.26.14) Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /databricks/python3/lib/python3.10/site-packages (from botocore<1.34.70,>=1.34.41->datacontract-cli) (2.8.2) Requirement already satisfied: pyarrow-hotfix in /databricks/python3/lib/python3.10/site-packages (from deltalake~=0.17.0->datacontract-cli) (0.5) Requirement already satisfied: pyarrow>=8 in /databricks/python3/lib/python3.10/site-packages (from deltalake~=0.17.0->datacontract-cli) (8.0.0) Collecting opentelemetry-sdk~=1.16.0 Using cached opentelemetry_sdk-1.16.0-py3-none-any.whl (94 kB) Requirement already satisfied: googleapis-common-protos~=1.52 in /databricks/python3/lib/python3.10/site-packages (from opentelemetry-exporter-otlp-proto-grpc~=1.16.0->datacontract-cli) (1.60.0) Collecting opentelemetry-proto==1.16.0 Using cached opentelemetry_proto-1.16.0-py3-none-any.whl (52 kB) Requirement already satisfied: grpcio<2.0.0,>=1.0.0 in /databricks/python3/lib/python3.10/site-packages (from opentelemetry-exporter-otlp-proto-grpc~=1.16.0->datacontract-cli) (1.48.2) Collecting opentelemetry-api~=1.15 Using cached opentelemetry_api-1.24.0-py3-none-any.whl (60 kB) Collecting backoff<3.0.0,>=1.10.0 Using cached backoff-2.2.1-py3-none-any.whl (15 kB) Requirement already satisfied: protobuf<5.0,>=3.19 in /databricks/python3/lib/python3.10/site-packages (from opentelemetry-proto==1.16.0->opentelemetry-exporter-otlp-proto-grpc~=1.16.0->datacontract-cli) (4.24.0) Collecting annotated-types>=0.4.0 Using cached annotated_types-0.6.0-py3-none-any.whl (12 kB) Collecting pydantic-core==2.18.2 Using cached pydantic_core-2.18.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB) Requirement already satisfied: charset-normalizer<4,>=2 in /databricks/python3/lib/python3.10/site-packages (from requests~=2.31.0->datacontract-cli) (2.0.4) Requirement already satisfied: certifi>=2017.4.17 in /databricks/python3/lib/python3.10/site-packages (from requests~=2.31.0->datacontract-cli) (2022.12.7) Requirement already satisfied: idna<4,>=2.5 in /databricks/python3/lib/python3.10/site-packages (from requests~=2.31.0->datacontract-cli) (3.4) Collecting markdown-it-py>=2.2.0 Using cached markdown_it_py-3.0.0-py3-none-any.whl (87 kB) Collecting pygments<3.0.0,>=2.13.0 Using cached pygments-2.18.0-py3-none-any.whl (1.2 MB) Collecting platformdirs<4.0.0,>=2.6.0 Using cached platformdirs-3.11.0-py3-none-any.whl (17 kB) Collecting sortedcontainers>=2.4.0 Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB) Requirement already satisfied: cffi<2.0.0,>=1.9 in /databricks/python3/lib/python3.10/site-packages (from snowflake-connector-python[pandas]<3.8,>=3.6->datacontract-cli) (1.15.1) Collecting tomlkit Using cached tomlkit-0.12.5-py3-none-any.whl (37 kB) Requirement already satisfied: pyjwt<3.0.0 in /usr/lib/python3/dist-packages (from snowflake-connector-python[pandas]<3.8,>=3.6->datacontract-cli) (2.3.0) Requirement already satisfied: pytz in /databricks/python3/lib/python3.10/site-packages (from snowflake-connector-python[pandas]<3.8,>=3.6->datacontract-cli) (2022.7) Requirement already satisfied: cryptography<43.0.0,>=3.1.0 in /databricks/python3/lib/python3.10/site-packages (from snowflake-connector-python[pandas]<3.8,>=3.6->datacontract-cli) (39.0.1) Requirement already satisfied: filelock<4,>=3.5 in /usr/local/lib/python3.10/dist-packages (from snowflake-connector-python[pandas]<3.8,>=3.6->datacontract-cli) (3.12.3) Collecting asn1crypto<2.0.0,>0.24.0 Using cached asn1crypto-1.5.1-py2.py3-none-any.whl (105 kB) Collecting pyOpenSSL<25.0.0,>=16.2.0 Using cached pyOpenSSL-24.1.0-py3-none-any.whl (56 kB) Collecting soda-core==3.3.4 Using cached soda_core-3.3.4-py3-none-any.whl (193 kB) Collecting google-cloud-bigquery<4.0,>=2.25.0 Using cached google_cloud_bigquery-3.22.0-py2.py3-none-any.whl (236 kB) Requirement already satisfied: click~=8.0 in /databricks/python3/lib/python3.10/site-packages (from soda-core==3.3.4->soda-core-bigquery<3.4.0,>=3.3.1->datacontract-cli) (8.0.4) Requirement already satisfied: Jinja2<4.0,>=2.11 in /databricks/python3/lib/python3.10/site-packages (from soda-core==3.3.4->soda-core-bigquery<3.4.0,>=3.3.1->datacontract-cli) (3.1.2) Collecting antlr4-python3-runtime~=4.11.1 Using cached antlr4_python3_runtime-4.11.1-py3-none-any.whl (144 kB) Collecting sqlparse~=0.4 Using cached sqlparse-0.5.0-py3-none-any.whl (43 kB) Collecting opentelemetry-api~=1.15 Using cached opentelemetry_api-1.22.0-py3-none-any.whl (57 kB) Collecting ruamel.yaml<0.18.0,>=0.17.0 Using cached ruamel.yaml-0.17.40-py3-none-any.whl (113 kB) Requirement already satisfied: markupsafe<=2.1.2,>=2.0.1 in /databricks/python3/lib/python3.10/site-packages (from soda-core==3.3.4->soda-core-bigquery<3.4.0,>=3.3.1->datacontract-cli) (2.1.1) Collecting inflect~=7.0 Using cached inflect-7.2.1-py3-none-any.whl (34 kB) Collecting psycopg2-binary<3.0,>=2.8.5 Using cached psycopg2_binary-2.9.9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB) Collecting pyspark Using cached pyspark-3.5.1-py2.py3-none-any.whl Collecting databricks-sql-connector Using cached databricks_sql_connector-3.1.2-py3-none-any.whl (427 kB) WARNING: typer 0.12.3 does not provide the extra 'all' Collecting shellingham>=1.3.0 Using cached shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB) Collecting wrapt<2.0.0,>=1.10.10 Using cached wrapt-1.16.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (80 kB) Collecting aioitertools<1.0.0,>=0.5.1 Using cached aioitertools-0.11.0-py3-none-any.whl (23 kB) Collecting frozenlist>=1.1.1 Using cached frozenlist-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (239 kB) Collecting multidict<7.0,>=4.5 Using cached multidict-6.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (124 kB) Collecting aiosignal>=1.1.2 Using cached aiosignal-1.3.1-py3-none-any.whl (7.6 kB) Requirement already satisfied: attrs>=17.3.0 in /databricks/python3/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs==2024.3.1->datacontract-cli) (22.1.0) Collecting yarl<2.0,>=1.0 Using cached yarl-1.9.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (301 kB) Collecting async-timeout<5.0,>=4.0 Using cached async_timeout-4.0.3-py3-none-any.whl (5.7 kB) Requirement already satisfied: pycparser in /databricks/python3/lib/python3.10/site-packages (from cffi<2.0.0,>=1.9->snowflake-connector-python[pandas]<3.8,>=3.6->datacontract-cli) (2.21) Collecting google-resumable-media<3.0dev,>=0.6.0 Using cached google_resumable_media-2.7.0-py2.py3-none-any.whl (80 kB) Collecting google-auth<3.0.0dev,>=2.14.1 Using cached google_auth-2.29.0-py2.py3-none-any.whl (189 kB) Collecting google-api-core[grpc]!=2.0.,!=2.1.,!=2.10.,!=2.2.,!=2.3.,!=2.4.,!=2.5.,!=2.6.,!=2.7.,!=2.8.,!=2.9.,<3.0.0dev,>=1.34.1 Using cached google_api_core-2.19.0-py3-none-any.whl (139 kB) Collecting google-cloud-core<3.0.0dev,>=1.6.0 Using cached google_cloud_core-2.4.1-py2.py3-none-any.whl (29 kB) Requirement already satisfied: six>=1.5.2 in /usr/lib/python3/dist-packages (from grpcio<2.0.0,>=1.0.0->opentelemetry-exporter-otlp-proto-grpc~=1.16.0->datacontract-cli) (1.16.0) Collecting mdurl~=0.1 Using cached mdurl-0.1.2-py3-none-any.whl (10.0 kB) Collecting deprecated>=1.2.6 Using cached Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB) Collecting importlib-metadata<7.0,>=6.0 Using cached importlib_metadata-6.11.0-py3-none-any.whl (23 kB) Collecting opentelemetry-api~=1.15 Using cached opentelemetry_api-1.16.0-py3-none-any.whl (57 kB) Collecting opentelemetry-semantic-conventions==0.37b0 Using cached opentelemetry_semantic_conventions-0.37b0-py3-none-any.whl (26 kB) Requirement already satisfied: setuptools>=16.0 in /databricks/python3/lib/python3.10/site-packages (from opentelemetry-sdk~=1.16.0->opentelemetry-exporter-otlp-proto-grpc~=1.16.0->datacontract-cli) (65.6.3) Collecting cryptography<43.0.0,>=3.1.0 Using cached cryptography-42.0.7-cp39-abi3-manylinux_2_28_x86_64.whl (3.8 MB) Requirement already satisfied: anyio<5,>=3.4.0 in /databricks/python3/lib/python3.10/site-packages (from starlette<0.38.0,>=0.37.2->fastapi==0.110.1->datacontract-cli) (3.5.0) Collecting openpyxl<4.0.0,>=3.0.10 Using cached openpyxl-3.1.2-py2.py3-none-any.whl (249 kB) Collecting thrift<0.17.0,>=0.16.0 Using cached thrift-0.16.0-cp310-cp310-linux_x86_64.whl Collecting lz4<5.0.0,>=4.0.2 Using cached lz4-4.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) Requirement already satisfied: oauthlib<4.0.0,>=3.1.0 in /usr/lib/python3/dist-packages (from databricks-sql-connector->soda-core-spark[databricks]<3.4.0,>=3.3.1->datacontract-cli) (3.2.0) Collecting pyarrow>=8 Using cached pyarrow-14.0.2-cp310-cp310-manylinux_2_28_x86_64.whl (38.0 MB) Collecting py4j==0.10.9.7 Using cached py4j-0.10.9.7-py2.py3-none-any.whl (200 kB) Requirement already satisfied: sniffio>=1.1 in /databricks/python3/lib/python3.10/site-packages (from anyio<5,>=3.4.0->starlette<0.38.0,>=0.37.2->fastapi==0.110.1->datacontract-cli) (1.2.0) Collecting proto-plus<2.0.0dev,>=1.22.3 Using cached proto_plus-1.23.0-py3-none-any.whl (48 kB) Requirement already satisfied: grpcio-status<2.0.dev0,>=1.33.2 in /databricks/python3/lib/python3.10/site-packages (from google-api-core[grpc]!=2.0.,!=2.1.,!=2.10.,!=2.2.,!=2.3.,!=2.4.,!=2.5.,!=2.6.,!=2.7.,!=2.8.,!=2.9.,<3.0.0dev,>=1.34.1->google-cloud-bigquery<4.0,>=2.25.0->soda-core-bigquery<3.4.0,>=3.3.1->datacontract-cli) (1.48.1) Collecting rsa<5,>=3.1.4 Using cached rsa-4.9-py3-none-any.whl (34 kB) Collecting pyasn1-modules>=0.2.1 Using cached pyasn1_modules-0.4.0-py3-none-any.whl (181 kB) Collecting cachetools<6.0,>=2.0.0 Using cached cachetools-5.3.3-py3-none-any.whl (9.3 kB) Collecting google-crc32c<2.0dev,>=1.0 Using cached google_crc32c-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32 kB) Requirement already satisfied: zipp>=0.5 in /usr/lib/python3/dist-packages (from importlib-metadata<7.0,>=6.0->opentelemetry-api~=1.15->opentelemetry-exporter-otlp-proto-grpc~=1.16.0->datacontract-cli) (1.0.0) Requirement already satisfied: more-itertools in /usr/lib/python3/dist-packages (from inflect~=7.0->soda-core==3.3.4->soda-core-bigquery<3.4.0,>=3.3.1->datacontract-cli) (8.10.0) Collecting typeguard>=4.0.1 Using cached typeguard-4.2.1-py3-none-any.whl (34 kB) Collecting et-xmlfile Using cached et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB) Collecting ruamel.yaml.clib>=0.2.7 Using cached ruamel.yaml.clib-0.2.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (526 kB) Collecting pyasn1<0.7.0,>=0.4.6 Using cached pyasn1-0.6.0-py2.py3-none-any.whl (85 kB) Installing collected packages: sortedcontainers, py4j, ply, fastjsonschema, asn1crypto, antlr4-python3-runtime, wrapt, typing-extensions, tomlkit, thrift, sqlparse, simple-ddl-parser, shellingham, ruamel.yaml.clib, requests, pyyaml, python-multipart, python-dotenv, pyspark, pygments, pyasn1, pyarrow, psycopg2-binary, proto-plus, platformdirs, opentelemetry-semantic-conventions, opentelemetry-proto, multidict, mdurl, lz4, isodate, google-crc32c, fsspec, frozenlist, et-xmlfile, duckdb, cramjam, cachetools, backoff, avro, async-timeout, annotated-types, aioitertools, yarl, typeguard, starlette, ruamel.yaml, rsa, rdflib, pydantic-core, pyasn1-modules, openpyxl, markdown-it-py, google-resumable-media, deprecated, deltalake, cryptography, botocore, aiosignal, s3transfer, rich, pyOpenSSL, pydantic, opentelemetry-api, inflect, google-auth, fastparquet, databricks-sql-connector, aiohttp, typer, snowflake-connector-python, opentelemetry-sdk, google-api-core, fastapi, boto3, aiobotocore, s3fs, opentelemetry-exporter-otlp-proto-http, opentelemetry-exporter-otlp-proto-grpc, google-cloud-core, soda-core, google-cloud-bigquery, soda-core-spark, soda-core-snowflake, soda-core-postgres, soda-core-duckdb, soda-core-bigquery, soda-core-spark-df, datacontract-cli Attempting uninstall: fastjsonschema Found existing installation: fastjsonschema 2.18.0 Not uninstalling fastjsonschema at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 'fastjsonschema'. No files were found to uninstall. Attempting uninstall: typing-extensions Found existing installation: typing_extensions 4.4.0 Not uninstalling typing-extensions at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 'typing_extensions'. No files were found to uninstall. Attempting uninstall: requests Found existing installation: requests 2.28.1 Not uninstalling requests at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 'requests'. No files were found to uninstall. Attempting uninstall: pygments Found existing installation: Pygments 2.11.2 Not uninstalling pygments at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 'Pygments'. No files were found to uninstall. Attempting uninstall: pyarrow Found existing installation: pyarrow 8.0.0 Not uninstalling pyarrow at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 'pyarrow'. No files were found to uninstall. Attempting uninstall: platformdirs Found existing installation: platformdirs 2.5.2 Not uninstalling platformdirs at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 'platformdirs'. No files were found to uninstall. Attempting uninstall: cryptography Found existing installation: cryptography 39.0.1 Not uninstalling cryptography at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 'cryptography'. No files were found to uninstall. Attempting uninstall: botocore Found existing installation: botocore 1.27.96 Not uninstalling botocore at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 'botocore'. No files were found to uninstall. Attempting uninstall: s3transfer Found existing installation: s3transfer 0.6.2 Not uninstalling s3transfer at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 's3transfer'. No files were found to uninstall. Attempting uninstall: pydantic Found existing installation: pydantic 1.10.6 Not uninstalling pydantic at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 'pydantic'. No files were found to uninstall. Attempting uninstall: boto3 Found existing installation: boto3 1.24.28 Not uninstalling boto3 at /databricks/python3/lib/python3.10/site-packages, outside environment /local_disk0/.ephemeral_nfs/envs/pythonEnv-fe72d4f3-d382-4ccf-9db9-d563f8e00b68 Can't uninstall 'boto3'. No files were found to uninstall. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. databricks-sdk 0.1.6 requires requests<2.29.0,>=2.28.1, but you have requests 2.31.0 which is incompatible. virtualenv 20.16.7 requires platformdirs<3,>=2.4, but you have platformdirs 3.11.0 which is incompatible. Successfully installed aiobotocore-2.12.3 aiohttp-3.9.5 aioitertools-0.11.0 aiosignal-1.3.1 annotated-types-0.6.0 antlr4-python3-runtime-4.11.1 asn1crypto-1.5.1 async-timeout-4.0.3 avro-1.11.3 backoff-2.2.1 boto3-1.34.69 botocore-1.34.69 cachetools-5.3.3 cramjam-2.8.3 cryptography-42.0.7 databricks-sql-connector-3.1.2 datacontract-cli-0.10.3 deltalake-0.17.4 deprecated-1.2.14 duckdb-0.10.2 et-xmlfile-1.1.0 fastapi-0.110.1 fastjsonschema-2.19.1 fastparquet-2024.2.0 frozenlist-1.4.1 fsspec-2024.3.1 google-api-core-2.19.0 google-auth-2.29.0 google-cloud-bigquery-3.22.0 google-cloud-core-2.4.1 google-crc32c-1.5.0 google-resumable-media-2.7.0 inflect-7.2.1 isodate-0.6.1 lz4-4.3.3 markdown-it-py-3.0.0 mdurl-0.1.2 multidict-6.0.5 openpyxl-3.1.2 opentelemetry-api-1.16.0 opentelemetry-exporter-otlp-proto-grpc-1.16.0 opentelemetry-exporter-otlp-proto-http-1.16.0 opentelemetry-proto-1.16.0 opentelemetry-sdk-1.16.0 opentelemetry-semantic-conventions-0.37b0 platformdirs-3.11.0 ply-3.11 proto-plus-1.23.0 psycopg2-binary-2.9.9 py4j-0.10.9.7 pyOpenSSL-24.1.0 pyarrow-14.0.2 pyasn1-0.6.0 pyasn1-modules-0.4.0 pydantic-2.7.1 pydantic-core-2.18.2 pygments-2.18.0 pyspark-3.5.1 python-dotenv-1.0.1 python-multipart-0.0.9 pyyaml-6.0.1 rdflib-7.0.0 requests-2.31.0 rich-13.7.1 rsa-4.9 ruamel.yaml-0.17.40 ruamel.yaml.clib-0.2.8 s3fs-2024.3.1 s3transfer-0.10.1 shellingham-1.5.4 simple-ddl-parser-1.1.0 snowflake-connector-python-3.7.1 soda-core-3.3.4 soda-core-bigquery-3.3.4 soda-core-duckdb-3.3.4 soda-core-postgres-3.3.4 soda-core-snowflake-3.3.4 soda-core-spark-3.3.4 soda-core-spark-df-3.3.4 sortedcontainers-2.4.0 sqlparse-0.5.0 starlette-0.37.2 thrift-0.16.0 tomlkit-0.12.5 typeguard-4.2.1 typer-0.12.3 typing-extensions-4.11.0 wrapt-1.16.0 yarl-1.9.4 Note: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.

`

jochenchrist commented 6 months ago

Still have trouble to reproduce...

Can you share the output of the table structure in hive:

spark.sql("DESCRIBE TABLE hive_metastore.***_production_schema. production_analytics").show()
letnotimitateothers commented 6 months ago

Sure !

Here it is 👍🏾

Screen Shot 2024-05-14 at 18 53 38
jochenchrist commented 6 months ago

Do you have a col_name study_uuid in it (=field name in the data contract)? What's it's type?

letnotimitateothers commented 5 months ago

Hello Jochen :) Yes I have a col_name _studyuuid in my table.

I have done further tests and now it is working 💯 But only if I install databricks-sql-connector as a requirement. I think it is worth adding that to the documentation.

Screen Shot 2024-05-22 at 18 38 25

Thanks again for your help.

I'm eager to use your library in my work.

Am.

simonharrer commented 5 months ago

@letnotimitateothers thanks for letting us know.

Is it really a documentation issue, or should we add that library to a dependency of the datacontract-cli? To me, it feels we should add that to the cli tool.