datacontract / datacontract-cli

CLI to manage your datacontract.yaml files
https://cli.datacontract.com
Other
484 stars 94 forks source link

Added parquet support for all supported types #393

Closed ArtyomyuS closed 2 months ago

ArtyomyuS commented 3 months ago
jochenchrist commented 3 months ago

Can you have a look at the failing tests?

ArtyomyuS commented 2 months ago

Hey @jochenchrist , I do see that the changes I've made introduced a regression, just want to be sure this is an expected behaviour, so could you please help me out here.

Expected test test_export_dbml.py:test_cli_with_server:

Expected Result:
 /*

Generated at Aug 29 2024 by datacontract-cli version 0.10.11
for datacontract Orders Latest (urn:datacontract:checkout:orders-latest) version 1.0.0 
Using s3 Types for the field types

*/

Project "Orders Latest" {
    Note: '''Successful customer orders in the webshop. 
All orders since 2020-01-01. 
Orders with their line items are in their current state (no history included).
'''
}

Table "orders" { 
Note: "One record per order. Includes cancelled and deleted orders."
    "order_id" "VARCHAR" [pk,unique,not null,Note: "An internal ID that identifies an order in the online shop."]
"order_timestamp" "TIMESTAMP WITH TIME ZONE" [not null,Note: "The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful."]
"order_total" "STRUCT" [not null,Note: "Total amount the smallest monetary unit (e.g., cents)."]
"customer_id" "VARCHAR" [null,Note: "Unique identifier for the customer."]
"customer_email_address" "VARCHAR" [not null,Note: "The email address, as entered by the customer. The email address was not verified."]
"processed_timestamp" "TIMESTAMP WITH TIME ZONE" [not null,Note: "The timestamp when the record was processed by the data platform."]
}

Table "line_items" { 
Note: "A single article that is part of an order."
    "lines_item_id" "VARCHAR" [pk,unique,not null,Note: "Primary key of the lines_item_id table"]
"order_id" "VARCHAR" [null,Note: "An internal ID that identifies an order in the online shop."]
"sku" "VARCHAR" [null,Note: "The purchased article number"]
}
Ref: line_items.order_id > orders.order_id

Actual result:

Actual Result:
 /*

Generated at Aug 29 2024 by datacontract-cli version 0.10.11
for datacontract Orders Latest (urn:datacontract:checkout:orders-latest) version 1.0.0
Using s3 Types for the field types

*/

Project "Orders Latest" {
    Note: '''Successful customer orders in the webshop.
All orders since 2020-01-01.
Orders with their line items are in their current state (no history included).
'''
}

Table "orders" {
Note: "One record per order. Includes cancelled and deleted orders."
    "order_id" "VARCHAR" [pk,unique,not null,Note: "An internal ID that identifies an order in the online shop."]
"order_timestamp" "DATETIME" [not null,Note: "The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful."]
"order_total" "STRUCT(amount STRUCT(sum None, currency VARCHAR), due_date DATE, discount DOUBLE)" [not null,Note: "Total amount the smallest monetary unit (e.g., cents)."]
"customer_id" "VARCHAR" [null,Note: "Unique identifier for the customer."]
"customer_email_address" "VARCHAR" [not null,Note: "The email address, as entered by the customer. The email address was not verified."]
"processed_timestamp" "DATETIME" [not null,Note: "The timestamp when the record was processed by the data platform."]
}

Table "line_items" {
Note: "A single article that is part of an order."
    "lines_item_id" "VARCHAR" [pk,unique,not null,Note: "Primary key of the lines_item_id table"]
"order_id" "VARCHAR" [null,Note: "An internal ID that identifies an order in the online shop."]
"sku" "VARCHAR" [null,Note: "The purchased article number"]
}
Ref: line_items.order_id > orders.order_id

As we can see the changes are related to order_timestamp, TIMESTAMP and order_total, STRUCT.

I've checked the order_timestamp definition and the actual is actually the correct one as we can see:

      order_timestamp:
        description: The business timestamp in UTC when the order was successfully registered in the source system and the payment was successful.
        type: timestamp
        required: true
        example: "2024-09-09T08:30:00Z"

basically based on this condition: https://github.com/datacontract/datacontract-cli/pull/393/files#diff-52d7b034200569c8e44e8ec8c129d7866f341964fa9463f9e04f40d81aed0d8aR187 it is ok to have DATETIME here.

for the structure order_total checked the datacontract.yaml

        description: Total amount the smallest monetary unit (e.g., cents).
        type: record
        required: true
        fields:
          amount:
            description: The amount to pay
            required: true
            type: record
            fields:
              sum:
                description: the sum to pay
                required: true
                type: number
              currency:
                description: the currency the amount is in
                required: true
                type: string
                example: EUR

To me the actual is also expected, as the structure defined two fields with type.

@jochenchrist Could you please confirm that I just have to update the actual as expected in the test. Or if not please advice.

ArtyomyuS commented 2 months ago

Basically if you can confirm this updated test is OK: https://github.com/datacontract/datacontract-cli/pull/393/commits/bf1e590b1dc66674254a51c64ff25472f83fed0c.

jochenchrist commented 2 months ago

Happy to finally merge the PR. Thanks @ArtyomyuS for your contribution!