X-lab2017 / github-analysis-report-2019

GitHub 2019 Digital Report
MIT License
31 stars 20 forks source link

Propose:some detail to make the analysis more precise #6

Open king-gao opened 4 years ago

king-gao commented 4 years ago

The report is very intrested and is wonderful sample for us. It's is super helpful ! Anyway, I have some suggestion anybody can discuss with it. 1.The analysis divide 5 range for action: "issue comment","open issue","open pull request","Pull reuqest review comment","Pull request merged", we can see contribute is more with the Weighted is high from 1-5. Do you have the modle to prove the weighted is linear? 2.Do you consider the different class of the project have different result. AI \ Colud\ OS\ DB ect. We can find AI is active because the technical these year.I suggest the project can compare with the same calss. 3.Many Issue respond comment is auto by rubot,for example: first issue will trigger robot's automatic reply but not direct answer the issue or useful comment.So may be you can fliter the robot. I hope this issue will help you to optimize your model and analysis result.

frank-zsy commented 4 years ago
  1. We do not have a model to calculate the weight factors, actually these factors may be considered different from different community, so that is why in our script these factors can be passed in as custom params to calculate your own rank list. 1 - 5 is a quite simple approach but these are a consensus from several open source TL in Alibaba. We also appreciate any suggestions about it.

  2. Yes, different kinds of projects should not be ranked together. But to identify classification of all the projects on GitHub is a very difficult task which have not been resolved in industry or research area. Actually I have tried an algorithm to give a cluster approach with machine learning on all project in 2018 and shared the results in Open Summit 2019 @ Shanghai, you can refer to the video uploaded by CNCF.

  3. Actually I think robots will be very important in the future collaborate work and may have more impact on digital operation than human beings. In our lab, we even try to give a comprehensive robot account statistics to find out what robot or what automatically functions are most used in open source. Besides, it is very hard to identify all the robots, there are two ways to implement a GitHub robot, one is using account personal token to play an ordinary account which you can not figure out if it is a robot and other is building a GitHub App which will have a [bot] suffix in its name. So identify all the robots may not be practical.

Thanks for the discussion.

JerryKuan commented 4 years ago

@king-gao thanks for your comment. We are very happy that the report can be helpful to you. I want to talk about the first question. In fact, we measured the weight which is empirical, and have not proven the weight is linear. But we will tend to improve the method, that the weight is measured by the Entropy-Weight-method. The method is used to determine the weight by using the distribution of the data, which is more objective not empirical.

king-gao commented 4 years ago

@king-gao thanks for your comment. We are very happy that the report can be helpful to you. I want to talk about the first question. In fact, we measured the weight which is empirical, and have not proven the weight is linear. But we will tend to improve the method, that the weight is measured by the Entropy-Weight-method. The method is used to determine the weight by using the distribution of the data, which is more objective not empirical.

that's great. I hope we will talk about the Entropy-Weight-method and I will join you together.thanks jerry.

king-gao commented 4 years ago
  1. We do not have a model to calculate the weight factors, actually these factors may be considered different from different community, so that is why in our script these factors can be passed in as custom params to calculate your own rank list. 1 - 5 is a quite simple approach but these are a consensus from several open source TL in Alibaba. We also appreciate any suggestions about it.
  2. Yes, different kinds of projects should not be ranked together. But to identify classification of all the projects on GitHub is a very difficult task which have not been resolved in industry or research area. Actually I have tried an algorithm to give a cluster approach with machine learning on all project in 2018 and shared the results in Open Summit 2019 @ Shanghai, you can refer to the video uploaded by CNCF.
  3. Actually I think robots will be very important in the future collaborate work and may have more impact on digital operation than human beings. In our lab, we even try to give a comprehensive robot account statistics to find out what robot or what automatically functions are most used in open source. Besides, it is very hard to identify all the robots, there are two ways to implement a GitHub robot, one is using account personal token to play an ordinary account which you can not figure out if it is a robot and other is building a GitHub App which will have a [bot] suffix in its name. So identify all the robots may not be practical.

Thanks for the discussion.

Thank you for your answer,Frank! about third question , I am very agree with your first sentence. But I think when we metrics the project we'd better fliter the robot. I agree with you , it's hard to identify all the robots,but it's a technical problem that should be able to solve :) This maybe the Key achievements of X-lab, maybe a highlight of the X-lab:) thanks Frank again,so far this is terrific !