Closed PureNatural closed 1 year ago
A similar question.
The above method can get the results for application_software
domain.
If I want to get the openrank of each domain under the application_domain
, how should I choose the parameters here? @frank-zsy
Thanks for the report, actually this is a bug for group by function when grouping by labels.
The reason is that in this line of SQL, I used two arrayJoin
to get id and name for label data since some repos may labeled by multiple labels at same time. I thought this SQL will give a corresponding id and name columns but it doesn't.
Just like the image shows:
The rows with red rect is the rows expected to return, but it returns more since arrayJoin
gives a multiply of two arrays.
And after that, group by id column will give a random result for name and the openrank result is also multiplied by n
times with n
is the array length of the label.
I still can not think of a proper way to fix this right now.
Just find a way to generate the corresponding id
and name
column by tuples with arrayJoin
, I will fix this soon.
/self-assign
Tuple
can be used to generate the columns for corresponding id and names, but with another aggregation function in the SQL, ClickHouse throws an error about the items
column.
I opened an issue in ClickHouse repo and wait for the response from the community. https://github.com/ClickHouse/ClickHouse/issues/49583
@PureNatural I will fix the bug by #1288 , since the maintainer of ClickHouse reply my issue and give a solution.
And return of other label is still right because some repos may also have other Tech-1
level label like cloud_native
or big_data
, so you can filter the result to get the data you want just like this:
Is this fit your requirement?
And for the other question, if you want to compare the data among application_domain
, you can use application_domain
as label and use Domain-0
to group and filter the Others
row like this:
@PureNatural I will fix the bug by #1288 , since the maintainer of ClickHouse reply my issue and give a solution.
And return of other label is still right because some repos may also have other
Tech-1
level label likecloud_native
orbig_data
, so you can filter the result to get the data you want just like this:Is this fit your requirement?
Thanks for your job!
I think I can finish the blue paper after https://github.com/X-lab2017/open-digger/pull/1288 is merged.
My code will also be much simpler!
Description
I can use the following method to get the top 10 openrank databases.
But when I add the
groupBy
parameter, the openrank from other domains will also be counted, such as big_data, cloud_native.If I want to get the openrank of each subdomain under the database domain, how should I choose the parameters here?
Thanks for your reply!