datahub-project / datahub

The Metadata Platform for your Data Stack
https://datahubproject.io
Apache License 2.0
9.84k stars 2.9k forks source link

非常棒的项目 #134

Closed lonely7345 closed 4 years ago

lonely7345 commented 8 years ago

很抱歉我直接使用中文,太多了,英语会比较慢,不过几位作者都是中国人,应该都能看懂

很高兴看到这个项目,现在正好去缺少元数据管理,原来在京东商城时有类似的项目,不过到新公司的事一直在找是否有开源的此类项目,没有找到。

一个是对hive元数据进行发现同步,能够建立起元数据知识库,包括修改历史,字段解释,问答,全局查询。 另外能够通过数据仓库的调度系统,与表关联起来,构造起元数据的血缘关联。这正是我们需要的。

因为现在最大的困境就是分析师不知道用哪个表哪个字段,还有就是数据修改后,不清楚关联的其他哪些表会受影响。

我们使用的是cloudera公司的 cdh oozie hue hive sqoop一整套方案,之前也一直研究过通过oozie的输入路径构建起表的依赖关系。如果是sqoop action,建立 起与关系数据库对应关系,然后再监控关系数据库的表,如果有变动就报警。 如果是hive action,则建立 起数据仓库表之间的对应关系。 如果是我们的数据推送datachange aciton,则建立 起仓库表和目标系统的对应关系 如果关系能够清晰可见确实对整个系统非常有帮助。看到wherehows,发现很多想法一样

再次感谢几位作者,很希望能够参与其中!

ericsun2 commented 8 years ago

👍

joe-szmn commented 8 years ago

提供的VM mirror里边的lineage 不能正常用,应该怎么搞?

ericsun2 commented 8 years ago

The lineage in VM will only scan the job execution for the past 30 days, to see the demo for lineage, please go to Azkaban or HUE to launch some Pig, M/R or Hive jobs.

Then it will show up.

ranqiqiang commented 7 years ago

我们也在找数据血缘关系图的解析工具,我们hive SQL 比较复杂,有N个SQL 在一个hive 文件里面,还有各种特殊字符 jar 等等转义,支持力度 有相应的demo吗? 很期待这个项目

diaowenyang commented 6 years ago

@ranqiqiang 你好,我现在也在学习wherehows,请问你解决了抽取血缘关系数据的问题吗?我当前的wherehows可以抽取hive、oracle、els的元数据,但是血缘关系信息抽取不到,很头痛。

diaowenyang commented 6 years ago

@ericsun2 How can I get oracle lineage ?" please go to Azkaban or HUE to launch some Pig, M/R or Hive jobs" means wherehows cannot support oracle lineage ? Now I am using v1.0.0.

c-f-cooper commented 6 years ago

Starting 969dfb582e3c_wherehowsdocker_wherehows-mysql_1 ... error

ERROR: for 969dfb582e3c_wherehowsdocker_wherehows-mysql_1 Cannot start service wherehows-mysql: driver failed programming external connectivity on endpoint 969dfb582e3c_wherehowsdocker_wherehows-mysql_1 (d53a5b90e22403094cbf7f13a27f62c784a43f486901ea242348bf7eba6cb7ec): Error starting userland proxy: Bind for 0.0.0.0:3306 failed: port is already allocated

ERROR: for wherehows-mysql Cannot start service wherehows-mysql: driver failed programming external connectivity on endpoint 969dfb582e3c_wherehowsdocker_wherehows-mysql_1 (d53a5b90e22403094cbf7f13a27f62c784a43f486901ea242348bf7eba6cb7ec): Error starting userland proxy: Bind for 0.0.0.0:3306 failed: port is already allocated ERROR: Encountered errors while bringing up the project.

PeterXiaTian commented 6 years ago

hadoop自带的lineage有人用吗??修改下然后也能用

ranqiqiang commented 6 years ago

@diaowenyang 抽取出来就行,血缘关系 我们通过任务关联

keremsahin1 commented 4 years ago

Dear issue owner,

Thanks for your interest in WhereHows. We have recently announced DataHub which is the rebranding of WhereHows. LinkedIn improved the architecture of WhereHows and rebranded WhereHows into DataHub and replaced its metadata infrastructure in this direction. DataHub is a more advanced and improved metadata management product compared to WhereHows.

Unfortunately, we have to stop supporting WhereHows to better focus on DataHub and offer more help to DataHub users. Therefore, we will drop all issues related to WhereHows and will not accept any contribution for it. Active development for DataHub has already started on datahub branch and will continue to live in there until it's finally merged to master and project is renamed to DataHub.

Please check the datahub branch to get familar with DataHub.

Best, DataHub team