datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.73k stars 2.87k forks source link

how to make metadata ingestion from clickhouse #2381

Closed bobliugq closed 2 years ago

bobliugq commented 3 years ago

I want to use clickhouse as source data to import dataHub? Someone can Help?

hsheth2 commented 3 years ago

Hi @bobliugq - once #2389 is merged in, you should be able to easily define a custom SQL-based source.

For clickhouse, you can combine https://github.com/xzkostyan/clickhouse-sqlalchemy with a fairly simple config to make it work. Let me know if you run into any issues.

bobliugq commented 3 years ago

Hi @bobliugq - once #2389 is merged in, you should be able to easily define a custom SQL-based source.

For clickhouse, you can combine https://github.com/xzkostyan/clickhouse-sqlalchemy with a fairly simple config to make it work. Let me know if you run into any issues.

Thanks for you help! But I could not find the plugins of sqlalchemy when I finished updated datahub(version 0.1.1)

Command I use: pip install 'acryl-datahub[sqlalchemy]'

Warning Information: WARNING: acryl-datahub 0.1.1 does not provide the extra 'sqlalchemy'

hsheth2 commented 3 years ago

Just uploaded the new version to PyPI - https://pypi.org/project/acryl-datahub/0.1.2/

Can you try re-running pip install --upgrade 'acryl-datahub[sqlalchemy]' - it should get version 0.1.2

bobliugq commented 3 years ago

Unfortunately! I got same warning when I update my dataHub to version 0.1.2.

Warning Information: WARNING: acryl-datahub 0.1.2 does not provide the extra 'sqlalchemy'

hsheth2 commented 3 years ago

Ah I missed a line - sorry about that! I made a quick PR (#2409) to fix that.

In the meantime, you can manually install it by running pip install sqlalchemy as well, and then datahub check plugins should show sqlalchemy as enabled.

shirshanka commented 3 years ago

@bobliugq : just checking if you were able to get this working.

sandeep-devarapalli commented 2 years ago

Hi @bobliugq - once #2389 is merged in, you should be able to easily define a custom SQL-based source.

For clickhouse, you can combine https://github.com/xzkostyan/clickhouse-sqlalchemy with a fairly simple config to make it work. Let me know if you run into any issues.

Hi Team, any update regarding this? It would be great to use ClickHouse with Datahub

wangqinghuan commented 2 years ago

we just implemented it @sandeep-devarapalli
https://github.com/open-botech/datahub/blob/hb-datahub/metadata-ingestion/src/datahub/ingestion/source/sql/clickhouse.py

sandeep-devarapalli commented 2 years ago

@wangqinghuan This is amazing, thanks a lot, and can you please let me know if this is available in DataHub too?

grumbler commented 2 years ago

I've got it working with dependencies:

clickhouse-driver                  0.2.2
clickhouse-sqlalchemy              0.1.7

and then with generic sqlalchemy recipe:

source:
  type: sqlalchemy
  config:
    platform: ClickHouse
    # Coordinates
    connect_uri: "clickhouse+native://username:password@clickhouse-host.example.com:9000/"
ramakrishnabolisetty007 commented 2 years ago

we just implemented it @sandeep-devarapalli https://github.com/open-botech/datahub/blob/hb-datahub/metadata-ingestion/src/datahub/ingestion/source/sql/clickhouse.py

@wangqinghuan I tried applying your changes from this commit on current master and when I do ./gradlew metadata-ingestion:build

I'm getting the following error:

> Task :metadata-ingestion:lint FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':metadata-ingestion:lint'.
> Process 'command 'bash'' finished with non-zero exit value 1

Is datahub integration with clickhouse working with this branch? If yes, can you suggest what I'm missing?

TIA!

wangqinghuan commented 2 years ago

we just implemented it @sandeep-devarapalli https://github.com/open-botech/datahub/blob/hb-datahub/metadata-ingestion/src/datahub/ingestion/source/sql/clickhouse.py

@wangqinghuan I tried applying your changes from this commit on current master and when I do ./gradlew metadata-ingestion:build I'm getting the following error:

> Task :metadata-ingestion:lint FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':metadata-ingestion:lint'.
> Process 'command 'bash'' finished with non-zero exit value 1

Is datahub integration with clickhouse working with this branch? If yes, can you suggest what I'm missing? TIA!

avoid lint ./gradlew :metadata-ingestion:build -x lint

anshbansal commented 2 years ago

New clickhouse source is present. Closing this