Open mikeburke24 opened 1 month ago
@mikeburke24 Looks like due to https://github.com/datahub-project/datahub/pull/10928, you probably want to be on server 0.14.1 and a CLI version that is 0.14.1.x.
That should solve both the is_container config issue and the error during emission.
@hsheth2 Hi, I've upgraded to GMS tag [1f02c84] and CLI 0.14.1.3 and still we are getting this error
ERROR :: /assets/0/destinationUrn :: field is required but not found and has no default value
It does work for Oracle though. Do you have any ideas why it doesn't work for S3? Would you have any example syntax that might work?
Interesting, it looks lke we need to investigate the pattern_add_dataset_dataproduct transformer a bit more closely to determine why it would not be providing this field.
@jjoyce0510 thanks John! If you've ever got this to work or have any other example syntax please send it my way. I'm not sure what field it is looking for that it can't find. Here's an example I've tried on a local build
type: pattern_add_dataset_dataproduct
config:
dataset_to_data_product_urns_pattern:
rules:
'.*': 'urn:li:dataProduct:xxxxxxxx'
Can you post your full S3 recipe (redacted)? It seems like we have some bug where we emit an invalid MCP but I'm having trouble narrowing it down.
@asikowitz sure
source:
type: s3
config:
path_specs:
-
include: 's3://<mybucket>/<myfile.csv>'
transformers:
-
type: pattern_add_dataset_dataproduct
config:
is_container: true
dataset_to_data_product_urns_pattern:
rules:
'urn:li:dataset:(urn:li:dataPlatform:s3,<mybucket>/<myfile.csv>,prod)': 'urn:li:dataProduct:<urn>'
Describe the bug We are trying to automatically assign data products to datasets and their container during ingestion from S3. I have included the format of our transformer below:
To Reproduce
However, the ingestion fails with the following message: Failed to configure transformers: 1 validation error for PatternDatasetDataProductConfig is_container
extra fields not permitted (type=value_error.extra) If we remove the is_container portion, the ingestion still fails with the message below: ERROR :: /assets/0/destinationUrn :: field is required but not found and has no default value
Expected behavior The documentation that you linked states that is_container is supported:
Additional context This transformer format works fine for Oracle (if is_container is removed) but doesn't work for S3