X-lab2017 / open-perf

Benchmark suit for large scale socio-technical datasets in open collaboration
MIT License
10 stars 19 forks source link

[OSS101] Task 8: Classification reasearch : the role of developer in Github open source community #64

Open huangfan0 opened 3 months ago

huangfan0 commented 3 months ago

the aim of the task is to classify the role of developer in Github open source community. Based on the developer's behavior and influence in the project, the developer's role can be roughly divided into four categories: observer, contributor, maintainer, and leader. You need to construct a dataset and build a classification model to divide the role of the developer. You need to specify the method and the reason of dataset construction and the classification algorithm must be compare with other algorithm models. you'd better deeply analysis the behavior patterns based on classification result so that we can understand collaboration mechanism and open source ecology.

The relevant code and dataset for this task need to be provided in the repository.

vitaminzl commented 1 month ago

the aim of the task is to classify the role of developer in Github open source community. Based on the developer's behavior and influence in the project, the developer's role can be roughly divided into four categories: observer, contributor, maintainer, and leader. You need to construct a dataset and build a classification model to divide the role of the developer. You need to specify the method and the reason of dataset construction and the classification algorithm must be compare with other algorithm models. you'd better deeply analysis the behavior patterns based on classification result so that we can understand collaboration mechanism and open source ecology.

The relevant code and dataset for this task need to be provided in the repository.

请问一下这个任务是要训练一个有监督学习模型还是无监督学习模型?数据集是否需要自己获取? 如果是有监督学习,那么 observer, contributor, maintainer, leader 这些标签是要自己打吗?github 上貌似没有现成的标签。 如果是无监督学习,是否根据无监督方法,将其分为 4 类,最后通过数据分析将其对应到 observer, contributor, maintainer, leader 这些标签上。

huangfan0 commented 1 month ago

训练模型有监督、无监督都可以。数据集是自己收集,可以通过REST API或graphQL来收集。 有监督需要打标签,或者通过制定规则学习。 无监督在最好给出一些评价指标说明最后结果的好坏。