X-lab2017 / open-digger

Open source analysis tools
https://open-digger.cn
Apache License 2.0
286 stars 85 forks source link

[Feature] Add topics columns into ClickHouse schema #1407

Closed frank-zsy closed 10 months ago

frank-zsy commented 11 months ago

Description

We just ignored that actually from around 2021.11, the repo data in pull request related event logs started to contain the topics of the repo. We should add the column into the database so we can analysis the topic related data without accessing the GitHub API anymore. This would be a huge help for analyzing global data and technology trends.

frank-zsy commented 11 months ago

As the topics field is a string array type, so we should use Nested(name LowCardinality(String)) type for the column and use LowCardinality to speed up the lookup process.

/self-assign

frank-zsy commented 11 months ago

We will import logs again after the database schema change, so the online env will not be available for next 2 days.