getstrm / pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.
https://pace.getstrm.com
Apache License 2.0
34 stars 1 forks source link

[PACE-57] Better hierarchical entity support for OpendataDiscovery #63

Closed bvdeenen closed 11 months ago

bvdeenen commented 11 months ago

Current code uses essentially a list all tables query, and creates one PACE database entry for each one. This makes no sense and is not scalable. I've looked at the ODD rpc results, and database level information is available.

# essentially shows all datasets in ODD as databse
pace list databases --catalog odd -o json | jq -r '.databases[]|[.id,.display_name,.type]|@csv'
"111","CATALOG_RETURNS","Snowflake Sample Data"
"110","PARTSUPP","Snowflake Sample Data"
"109","CUSTOMER","Snowflake Sample Data"
"108","STORE_RETURNS","Snowflake Sample Data"
"107","WEB_SITE","Snowflake Sample Data"
"106","WEB_RETURNS","Snowflake Sample Data"
"105","PROMOTION","Snowflake Sample Data"
"104","CUSTOMER","Snowflake Sample Data"
"103","REGION","Snowflake Sample Data"
"102","CATALOG_RETURNS","Snowflake Sample Data"
"101","CATALOG_SALES","Snowflake Sample Data"
"100","HOURLY_16_TOTAL","Snowflake Sample Data"
"99","CALL_CENTER","Snowflake Sample Data"
"98","WEB_SITE","Snowflake Sample Data"
"97","REGION","Snowflake Sample Data"
"96","CUSTOMER_DEMOGRAPHICS","Snowflake Sample Data"

...

# shows one schema identical to the table name
pace list schemas --catalog odd --database 4
schemas:
- database:
    catalog:
      id: odd
      type: ODD
    display_name: sales_denorm
    id: "4"
    type: BookShop Data Lake
  id: "4"
  name: BookShop Data Lake

# And only shows one table
pace list tables --catalog odd --database 4 --schema 4
tables:
- id: "4"
  name: BookShop Data Lake
  schema:
    database:
      catalog:
        id: odd
        type: ODD
      display_name: sales_denorm
      id: "4"
      type: BookShop Data Lake
    id: "4"
    name: BookShop Data Lake

From SyncLinear.com | PACE-57

bvdeenen commented 11 months ago

WIP. Much improved.

pace list databases --catalog odd -o json | jq -r '.databases[]|[.id,.display_name,.type]|@csv'
"2","BookShop Data Lake","Data Lake"
"3","BookShop Transactional","Transactional"
"5","User Transactions","Messaging"
"7","KDS Clickstream","Messaging"
"8","Snowflake Sample Data","Samples"

pace list schemas --catalog odd --database 3 -o json | jq -r '.schemas[]|[.id,.name]|@csv'
"schema","BookShop Transactional"

pace list tables --catalog odd --database 3 --schema schema -o json | jq -r '.tables[]|[.id,.name,.schema.name]|@csv'
"15","dim_publishers","BookShop Transactional"
"14","fct_sales","BookShop Transactional"
"13","fct_inventory","BookShop Transactional"
"12","dim_currency","BookShop Transactional"
"11","dim_books","BookShop Transactional"
"10","dim_promo","BookShop Transactional"
"9","customer_tier_sbx","BookShop Transactional"
"8","dim_countries","BookShop Transactional"
"7","dim_cards","BookShop Transactional"
"6","dim_customer","BookShop Transactional"
"5","dim_payment","BookShop Transactional"

Only the get data-policy is broken atm.