datacontract / datacontract-cli

CLI to manage your datacontract.yaml files
https://cli.datacontract.com
Other
352 stars 60 forks source link

Testing complex data types #263

Open john-bunge-ed opened 2 weeks ago

john-bunge-ed commented 2 weeks ago

Hi!

Based on your documentation I assume the following is not possible. Nevertheless I would like to double check.

Is it possible to test the details of data contracts with complex/nested types?

E.g. imagine the following example. I have a field products of type array (Python: list) and a field shops-added of type struct (Python: dict).

My goal is to test whether: (1) field products is of type array, (2) field shops-added is of type struct, (3) field products contains only values of type string, (4) further restrictions on values in field products (e.g. length, regex patterns) hold true, (5) field shops-added contains only keys of type string (e.g. "abc"), (6) further restrictions on keys in field shops-added (e.g. length, regex patterns) hold true, (7) field shops-added contains only values of type timestamp (e.g. 2024-06-01), (8) further restrictions on keys in field shops-added hold true.

However, I do not want to split out the complex types into multiple models, but test everything with only one model.

{
   "id":"01",
   "name":"hans",
   "age":41,
   "products":[
      "sku_01",
      "sku_04"
   ],
   "shops_added":{
      "abc":"2024-06-01",
      "xyz":"2024-06-09"
   }
}

P.S. Apologies for spamming you with quite a number of questions recently

jochenchrist commented 2 weeks ago

What is your server type?

There is a JSON-Schema Check Engine implemented for JSON files on S3.

john-bunge-ed commented 2 weeks ago

My server is dataframe / temporary view in Databricks :)

jochenchrist commented 2 weeks ago

In general, with dataframes there should be support by the Soda Code engine that is used internally, as per this test: https://github.com/sodadata/soda-core/blob/af649b977fc2489eb841cf16ab4f0d9fc3da2165/soda/spark_df/tests/test_spark_df.py#L6

Might be worth a try.

john-bunge-ed commented 2 weeks ago

Thanks :)

john-bunge-ed commented 1 week ago

Hi @jochenchrist

Follow-up question: Is it possible to test the content of an array, whose length can be variable?

Going back to my example from the opening post, what I specifically want to test is that:

(1) field products is of type array, (3) field products contains only values of type string, (4) further restrictions on values in field products (e.g. length, regex patterns) hold true