databricks / databricks-sdk-py

Databricks SDK for Python (Beta)
https://databricks-sdk-py.readthedocs.io/
Apache License 2.0
323 stars 107 forks source link

[ISSUE] type_precision and type_scale in columns info are not correct #523

Open meretri opened 5 months ago

meretri commented 5 months ago

Description When using the tables API to retrieve Schema information, the fields type_precision and type_scale are not populated correctly.

Reproduction first creating a sample table

%sql  
CREATE OR REPLACE TABLE dbt_testing.mri.data_type_example (  
  id DECIMAL(10,0),   
  id2 DECIMAL(5,2)  
);

then use the API to get the schema information:

from databricks.sdk import WorkspaceClient  
w = WorkspaceClient()  
full_name = f"dbt_testing.mri.data_type_example"  
table = w.tables.get(full_name)  

for field in table.columns:  
    data_type = field.type_text  
    precision = field.type_precision  
    scale = field.type_scale    
    print(f"type: {data_type}, precision: {precision}, scale: {scale}")  

Which results in the following output:

type: decimal(10,0), precision: 0, scale: 0
type: decimal(5,2), precision: 0, scale: 0

Expected behavior I expect following output:

type: decimal(10,0), precision: 10, scale: 0
type: decimal(5,2), precision: 5, scale: 2

Currently I just extract the information form the "type_text" but would appreciate it, if I could directly use the corresponding fields.

mgyucht commented 5 months ago

Thank you for reporting this. This is likely a problem in the underlying API definition which is currently maintained by hand. If you could please include debug logs as described in the issue template, that would help us quickly triage and correct this issue.

meretri commented 5 months ago

there you go:

DEBUG:databricks.sdk:/root/.databrickscfg does not exist DEBUG:databricks.sdk:Attempting to configure auth: pat DEBUG:databricks.sdk:Attempting to configure auth: basic DEBUG:databricks.sdk:Attempting to configure auth: metadata-service DEBUG:databricks.sdk:Attempting to configure auth: oauth-m2m DEBUG:databricks.sdk:Attempting to configure auth: azure-client-secret DEBUG:databricks.sdk:Attempting to configure auth: github-oidc-azure DEBUG:databricks.sdk:Attempting to configure auth: azure-cli DEBUG:databricks.sdk:Attempting to configure auth: external-browser DEBUG:databricks.sdk:Attempting to configure auth: databricks-cli DEBUG:databricks.sdk:Attempting to configure auth: runtime DEBUG:databricks.sdk:runtime SDK credential provider available DEBUG:databricks.sdk:[init_runtime_native_auth] runtime native auth configured DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): switzerlandnorth.azuredatabricks.net:443 DEBUG:urllib3.connectionpool:https://switzerlandnorth.azuredatabricks.net:443 "GET /api/2.1/unity-catalog/tables/dbt_testing.mri.data_type_example HTTP/1.1" 200 None DEBUG:databricks.sdk:GET /api/2.1/unity-catalog/tables/dbt_testing.mri.data_type_example < 200 OK < { < "browse_only": false, < "catalog_name": "dbt_testing", < "columns": [ < { < "name": "id", < "nullable": true, < "position": 0, < "type_json": "{\"name\":\"id\",\"type\":\"decimal(10,0)\",\"nullable\":true,\"metadata\":{}}", < "type_name": "DECIMAL", < "type_precision": 0, < "type_scale": 0, < "type_text": "decimal(10,0)" < }, < "... (1 additional elements)" < ], < "created_at": 1706713401140, < "created_by": "xxx", < "data_access_configuration_id": "00000000-0000-0000-0000-000000000000", < "data_source_format": "DELTA", < "delta_runtime_properties_kvpairs": {}, < "full_name": "dbt_testing.mri.data_type_example", < "generation": 1, < "metastore_id": "df1a5b8f-0154-48f5-abfd-c39dad04affa", < "name": "data_type_example", < "owner": "xxx", < "properties": { < "delta.lastCommitTimestamp": "1706713586000", < "delta.lastUpdateVersion": "1", < "delta.minReaderVersion": "1", < "delta.minWriterVersion": "2" < }, < "schema_name": "mri", < "securable_kind": "TABLE_DELTA", < "securable_type": "TABLE", < "storage_location": "abfss://mnt@xxxx.dfs.core.windows.net/meta/df1a5b8f-0154-48f5-abfd-c39dad04affa/tables/bd7223... (30 more bytes)", < "table_id": "bd722321-25b2-4e3d-8c79-5dae4d9f552c", < "table_type": "MANAGED", < "updated_at": 1706713588609, < "updated_by": "xxx" < } DEBUG:py4j.clientserver:Command to send: i java.util.HashMap e

DEBUG:py4j.clientserver:Answer received: !yao387 DEBUG:py4j.clientserver:Command to send: c o387 put spackageNameRaw slogging e

DEBUG:py4j.clientserver:Answer received: !yn DEBUG:py4j.clientserver:Command to send: c o387 put spackageNameAllowlisted s0a149844a7c9f4cff948bf39833fcc5429ce2325e43d6b3873a22a76e956e470 e

DEBUG:py4j.clientserver:Answer received: !yn DEBUG:py4j.clientserver:Command to send: c o387 put spackageHash s0a149844a7c9f4cff948bf39833fcc5429ce2325e43d6b3873a22a76e956e470 e

DEBUG:py4j.clientserver:Answer received: !yn DEBUG:py4j.clientserver:Command to send: c o387 put spackageVersion sunknown e

DEBUG:py4j.clientserver:Answer received: !yn DEBUG:py4j.clientserver:Command to send: c o377 logUsage spythonPackageImported ro387 n e

DEBUG:py4j.clientserver:Answer received: !yv DEBUG:py4j.clientserver:Command to send: i java.util.HashMap e

DEBUG:py4j.clientserver:Answer received: !yao388 DEBUG:py4j.clientserver:Command to send: c o388 put spackageNameRaw sdatabricks e

DEBUG:py4j.clientserver:Answer received: !yn DEBUG:py4j.clientserver:Command to send: c o388 put spackageNameAllowlisted sdatabricks e

DEBUG:py4j.clientserver:Answer received: !yn DEBUG:py4j.clientserver:Command to send: c o388 put spackageHash s2dc010636b9dce91b7328ecb3d287ef6245c96281f1ebf37071210a6ef829000 e

DEBUG:py4j.clientserver:Answer received: !yn DEBUG:py4j.clientserver:Command to send: c o388 put spackageVersion sunknown e

DEBUG:py4j.clientserver:Answer received: !yn DEBUG:py4j.clientserver:Command to send: c o377 logUsage spythonPackageImported ro388 n e

DEBUG:py4j.clientserver:Answer received: !yv

mgyucht commented 5 months ago

Based on the debug logs, it seems like type_precision and type_scale are always set to 0 in the API response (you can see from the first column that precision is 0 when it should be 10). I will file a ticket to the team operating this API to investigate and fix the underlying issue.

edwardfeng-db commented 5 months ago

Thanks for raising this, a ticket has been filed internally, we will keep you updated @meretri