DeveloperLiberationFront / AffectAnalysisToolEvaluation

SEmotion_18 paper on evaluating the reliability of sentiment and politeness analysis tools
3 stars 4 forks source link

How are the comments selected, what type of projects they belong to ? #3

Closed nasifimtiazohi closed 6 years ago

nasifimtiazohi commented 6 years ago

R3

it is unclear if the comments selected came from a random selection of all projects available in GHTorrent or from specific projects.

R2

Give some indication of which projects the 589 comments were taken. Were they actual software projects? Some more information about them would help. Could you add a sentence or two?

R1

7 - In threats to validity author state that "Finally, while we randomly picked 589 comments, they might not be representative of the whole GitHub community." GHTorrent dateset hosts tens of millions of developers' comments, 598 sampled comments are not representative at all.

nasifimtiazohi commented 6 years ago

I have rechecked the comments, they were randomly picked from all the projects.

I also manually investigated 20 comments and the project they belong to.

nasifimtiazohi commented 6 years ago

fixed in https://github.com/DeveloperLiberationFront/AffectAnalysisToolEvaluation/commit/bb92a21a834cb8f08dbac88fa61067bf5402b8ea

nasifimtiazohi commented 6 years ago

the comments fairly represent projects with the large community.

Why are 590 comments not enough for testing is unclear by R1.

There is also a point of saturation apart from the coding effort. However, I don't know how to scientifically show saturation.