datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.83k stars 2.9k forks source link

pattern_add_dataset_dataproduct works for Oracle ingestion but not S3 #11656

Open mikeburke24 opened 6 days ago

mikeburke24 commented 6 days ago

Describe the bug We are trying to automatically assign data products to datasets and their container during ingestion from S3. I have included the format of our transformer below:

To Reproduce

transformers:
    -
        type: pattern_add_dataset_dataproduct
        config:
            is_container: true
            dataset_to_data_product_urns_pattern:                
                rules:
                    '.*': 'urn:li:dataProduct:<DATA_PRODUCT_URN>'

However, the ingestion fails with the following message: Failed to configure transformers: 1 validation error for PatternDatasetDataProductConfig is_container

extra fields not permitted (type=value_error.extra) If we remove the is_container portion, the ingestion still fails with the message below: ERROR :: /assets/0/destinationUrn :: field is required but not found and has no default value

Expected behavior The documentation that you linked states that is_container is supported:

Additional context This transformer format works fine for Oracle (if is_container is removed) but doesn't work for S3

hsheth2 commented 1 day ago

@mikeburke24 Looks like due to https://github.com/datahub-project/datahub/pull/10928, you probably want to be on server 0.14.1 and a CLI version that is 0.14.1.x.

That should solve both the is_container config issue and the error during emission.