Open kyyberi opened 3 months ago
Implemented first version of this in the DataOps component. The origin attempts to describe the sources of data
dataOps:
data:
schemaLocationURL: http://http://192.168.10.1/schemas/2016/petshopML-2.3/schema/petstore.xsd
origin:
- source: human # sensor, human, analytics
sourceId:
type: raw # raw, cleansed
description:
checksum: # ?
- source: sensor # sensor, human, analytics
sourceId:
type: cleansed # raw, cleansed
description:
checksum: # ?
lineage:
dataLineageTool: Collibra
dataLineageOutput: http://192.168.10.1/lineage.json
infrastructure:
platform: Azure
region: West US 2 (Washington)
storageTechnology: Azure SQL
storageType: sql
containerTool: helm
build:
format: yaml
hashType: SHA-2
checksum: 7b7444ab8f5832e9ae8f54834782af995d0a83b4a1d77a75833eda7e19b4c921
signatureType: JWK
scriptURL: http://192.168.10.1/rundatapipeline.yml
deploymentDocumentationURL: http://192.168.10.1/datapipeline
Can we somehow add here "as code" part as well? A method to verify the source system and authenticity of the data directly?
Something similar to what is in data quality
dataQuality:
- dimension: accuracy
objective: 98
unit: percentage
monitoring:
type: SodaCL
spec:
- require_unique(member_id)
- require_range(age_band, 18, 100)
To be moved to the next version.
It is data genesis!
Which problem is this feature request solving?
Describe the solution you'd like
Any known practical use cases to apply?
Yes we do. It will provide business value. This is coming from practitioners. Details not revealed yet to protect frontrunner business
Can you submit a pull request?
No.
---- Leave intact! Approval of Contributor Agreement -----
By submitting issue you approve the Contributor Agreement, https://governance.opendataproducts.org/v1/contributions/contributor-agreement