Closed dasenCoding closed 1 year ago
Hi, what's your problems while building
?
In Data Sample
, these data are actually raw log data and it can import to clickhouse databae.
We also provide metrics interface. You can follow the tutorial in our readme to fetch these data.
I'm not sure what is the log data provided by opendigger already processed by the DataWarehouse
. Do you mean your data can't import to database?
Hi, what's your problems while
building
? InData Sample
, these data are actually raw log data and it can import to clickhouse databae. We also provide metrics interface. You can follow the tutorial in our readme to fetch these data. I'm not sure what isthe log data provided by opendigger already processed by the DataWarehouse
. Do you mean your data can't import to database?
Thanks your answer,friend!
(朋友,我不行了。我的英文不太好,表达不出来我的意思,请允许我使用中文说一下我的困惑,哈哈哈哈😥)
首先,现在已经获取了日志数据;
然后,the log data provided by opendigger already processed by the DataWarehouse
这句话是想问:opendigger提供的数据是原始日志数据,还是经过了数仓所产出来的派生数据
。
但这个问题,您这里回答了these data are actually raw log data
,所以如果是原始数据,那可能我还需要更细致的梳理下业务流程,确定主题域维度等。因为我现在感觉数仓中各层中表需要的字段几乎全部包含在 log 表中,感觉自己的想法实施不下去,没有构建的必要。(这个问题可能是我不彻底熟悉开发者和github的业务流程,不彻底清楚自己做数仓的需求所导致的,我需要再进行更多的了解。)
Hi, 你是希望把数据分表搭建数仓(主要是dw层、ad层),然后现在希望用采样数据做一个初步的尝试对吗?我个人认为还是很有价值的。
如果是这样,ghtorrent有过一个关系型建模(可是目前网站不能访问了,ghtorrent也停止服务了)可以供你参考。 。
我们的数据是和Github Restful API 中events接口保持一致的,因此你也可以了解一下github的restful、graphql API。
但是其实更多情况下,采样数据的作用是提供给用户一个探索数据、开发指标的数据集。我们的指标实现通常也是直接基于原始数据的这张宽表来实现的。
如果还有疑问,欢迎交流~
好嘞,非常感谢您提供的资料ghtorrent有过一个关系型建模
,以及认可 我个人认为还是很有价值的。
我现在正在尝试查找资料,梳理业务关系。后续如果还有疑问我们再进行讨论!😆😆😆
祝好,谢谢 :)
Description
Hi, opendigger's friends ! I encountered a problem and need some suggestions.
I wanner to practice building a
DataWarehouse
withsample data - Github's global log in 2020
. But when I was doing research, I feel it's difficult to build using theDataWarehouse construction process
with the provided data.So I would like to ask, is the log data provided by opendigger already processed by the
DataWarehouse
, or is it the original data?If it is the original data, it may be that I have not been able to sort out the business logic of github; if it is the former, it is not meaningful to use this data?
Thanks!😆