X-lab2017 / open-digger

Open source analysis tools
https://open-digger.cn
Apache License 2.0
286 stars 85 forks source link

[Refactor] SQL optimize for activity #861

Closed frank-zsy closed 2 years ago

frank-zsy commented 2 years ago

Right now, index.activity uses ClickHouse and script to implement a quite complex and not quite efficient version to calculate activity for repos and developers. But with some powerful functions provided by ClickHouse engine, we can do this more elegant and efficient.

We could use groupArrayInsertAt and limit by (key) to get the general SQL better. Be aware that the server version is 20.8.7.15 which does support dynamic array length so we need to calculate the array length first in the script.

frank-zsy commented 2 years ago

/self-assign

tyn1998 commented 2 years ago

Hello, @frank-zsy. Is the index.activity you mentioned same to the "Activity" metrics that Hypercrx presents?

image
frank-zsy commented 2 years ago

Yes, it is. Right now, the value is calculated from here https://github.com/X-lab2017/open-digger/blob/master/src/metrics/activity_openrank.ts#L71. In order to support different needs, we need the function to support configurable repo/org range, time range and various group function(like with group by month/quarter/year and repo/org/label). So the script right now is quite ugly and need to be optimized with a more elegant and efficient version of SQL.

tyn1998 commented 2 years ago

Really looking forward to the elegant version~

frank-zsy commented 2 years ago

@tyn1998 The new version is online, we can use a single SQL without any script modification to get data for any configuration now. Feel free to check it out.

tyn1998 commented 2 years ago
image

Cool~