datacontract / datacontract-cli

CLI to manage your datacontract.yaml files
https://cli.datacontract.com
Other
352 stars 60 forks source link

Machine testing of freshness (serviceLevels) #261

Open john-bunge-ed opened 2 weeks ago

john-bunge-ed commented 2 weeks ago

Question:

Is the freshness parameter in the serviceLevels object supposed to be machine testable?

I applied test() to my datacontract object (see below), however no automated test results on freshness could be retrieved from the .checks or .logs elements.

datacontract_obj = DataContract(
    data_contract_file='datacontract.yaml',
    spark=spark)
datacontract_run = datacontract_obj.test()

datacontract_run.checks

datacontract_run.logs

freshness was specified in the datacontract.yaml as follows. Column _timestampcol is of timestamp format.

servicelevels:
  freshness:
    description: The maximum age of the youngest entry is 1 hour
    threshold: 1h
    timestampField: table_name.timestamp_col
jochenchrist commented 2 weeks ago

Hi @john-bunge-ed, this is currently not yet implemented, but would totally make sense. Current workaround would be to add the freshness-test as additional quality test.

john-bunge-ed commented 2 weeks ago

@jochenchrist Where in the codebase would you suggest to add such an additional quality test?

jochenchrist commented 2 weeks ago

next to this: https://github.com/datacontract/datacontract-cli/blob/main/datacontract/export/sodacl_converter.py#L15

john-bunge-ed commented 2 weeks ago

thanks @jochenchrist