datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.64k stars 2.85k forks source link

Failed to ingest IBM Db2 metadata using SQLAlchemy source #8835

Closed deepak-garg closed 10 months ago

deepak-garg commented 12 months ago

Describe the Bug While trying to ingest metadata for IBM Db2 using SQLAlchemy Source and dialect ibm-db-sa, following error occurred Traceback (most recent call last): File "/data/vdc/conda/condapub/svc_am_cicd/envs/dh-actions/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 373, in run for record_envelope in self.transform(record_envelopes): File "/data/vdc/conda/condapub/svc_am_cicd/envs/dh-actions/lib/python3.10/site-packages/datahub/ingestion/extractor/mce_extractor.py", line 77, in get_records raise ValueError( ValueError: source produced an invalid metadata work unit: MetadataChangeEventClass(

Source has produced invalid metadata workunit. It results in Datasets without SchemaMetadata.

Cli Version: 0.10.2 Datahub Version: v0.10.2 SqlAlchemy Version: 1.4.41

To Reproduce Recipe Used source: type: sqlalchemy config: connect_uri: "ibm_db_sa://user:password@host.name.com:50000/database " platform: "db2" include_tables: true include_views: false sink: type: "datahub-rest" config: server: ${DATAHUB_GMS_HOST} token: ${DATAHUB_GMS_TOKEN}

Expected behavior Complete Ingestion of DB2 Metadata

Screenshot As we can see no schemetadata has been pushed

Screenshot 2023-09-06 at 7 55 41 PM

Attaching Logs: exec-urn_li_dataHubExecutionRequest_sqlalchemy-2023_09_01-14_44_01.log

Reason for the behaviour DataSetProperties Class field "description" has invalid data DatasetPropertiesClass({'customProperties': {}, 'externalUrl': None, 'name': 'vvaf_drf_im_loc', 'qualifiedName': None, 'description': (None,),

I have fixed the issue and will raise PR

deepak-garg commented 11 months ago

@HHQuraishi Source produced invalid metadata workunit during ingestion

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io