Closed xgdyp closed 1 year ago
Description:In OpenDigger, we are currently missing the exploration of graph metrics. So I think if we This task is mainly to explore GitHub collaboration from the perspective of social network.
Expected outcomes:
A case similar to others in notebook
Skills: python, pytorch
Description:Complex and long SQL is difficult to maintain, this is not conducive to the development of the project. One of the ways to solve this problem is to create SQL through SQL builder, which can help us reduce the difficulty of reading SQL, especially subquery. So we want to explore whether there is a mature framework that can help OpenDigger.
In Python, we can use pypika
which is a python pkg supporting some of clickhouse sql syntax. So what we need to do is to investigate in detail whether it can cover our sql, and whether it will really improve OpenDigger.
If yes, we can refactor Python kernel first as OSPP task. I think I can follow up on this project.
Expected outcomes: metrics generated by python sql builder
Skills: python typescript javascript SQL
references: https://github.com/didi/gendry https://github.com/sqlkata/querybuilder https://github.com/ibis-project/ibis https://github.com/doug-martin/goqu
During the development of indicators, we need to verify whether the written sql gets the correct query results.Our previous approach was to select a repo and compare the data manually. At the same time, sometimes the data is missing, and we are often not sure whether our implements are wrong or the data is abnormal. This approach is inefficient and error-prone.
So we need some DQA mechanism to finish data-check and metric check automatically.
Further more, I think we can add a dataset and using this for metrics explanation. e.g. the result of one metric runs on this dataset is 10. and we can clearly understand which records make this metric 10, and display these records
references: https://help.aliyun.com/document_detail/116897.html
hi @frank-zsy , do you have any more idea? And we need to select two from them because we only have 2 spots.
@xgdyp I think all the 3 tasks are great for OSPP task. But there are some tricky points about the tasks.
So I think the last 2 may work.
OK, for the first task, what I hope is to get a case, which includes how to build a graph from the raw logs in clickhouse, and use this graph for analysis. I think we don't need to use our own graph database (the way to build a graph network may also be different), its purpose is to provide user with a graph analysis tutorial
So we don't really need a graph database and we only need to load the data from ClickHouse and build graph in memory and get the metrics? This is doable but may not lead to large scale metric data export due to the performance issue.
This is doable but may not lead to large scale metric data export due to the performance issue.
That's a problem but I think it should be within the acceptable range. It may focus on the analysis of some communities (like paddlepaddle hackthon) rather than all.
OK, if you insist to add this task to OSPP, I am fine with it. But in the future, if we have a public graph database for global collaboration network, we may need a code refactoring for the network metric implementation.
Description
Hi all, we passed the OSPP2023 project review and OpenDigger has been accepted as the community of OSPP2023.
Now we need to determine our idea list. If you have any ideas or you want to be a project mentor, please leave a comment.
Please pay attention to that we need to post our ideas before 25 April.
some references: