Closed OrkoHunter closed 6 years ago
I've been looking into this, and I'm not sure exactly what is going on here. I think it has something to do with GHTorrent, as opposed to us.
Looking at the database, for twitter/finagle
(id 1372
) and user cacoco
(id 7875
), running the following query
SELECT *
FROM commits
WHERE project_id = 1372
AND committer_id = 7875
should give us all of cacoco
's commits to twitter/finagle
. According to the GitHub API, there should be 76 of them. Instead, I get the following 2 results:
When I try to look at both of these commits in the project_commits
table (which the /contributors
endpoint is using), for the commit with id 500482921
I get data back, but for the commit with id 500482928
I get nothing.
On top of this issue that this table doesn't seem to have all the data it should, it seems GHTorrent is only aware of 2 of cacoco
's 76 commits. @sgoggins any ideas?
Note that I just restarted the Twitter instance from today's dev branch today.
I'm still seeing this issue where cacoco
only has one commit. Could the GHTorrent database need updating?
@ccarterlandis : Try a hard refresh ... I am looking at this URL:
http://twitter.augurlabs.io/api/unstable/twitter/finagle/contributors
and I get a ton of data:
[{"name":"mosesn","user":132262,"commits":10.0,"issues":25.0,"commit_comments":7.0,"issue_comments":939.0,"pull_requests":0.0,"pull_request_comments":0.0,"total":981.0},{"name":"ICRILBRT","user":1256242,"commits":683.0,"issues":0.0,"commit_comments":0.0,"issue_comments":0.0,"pull_requests":0.0,"pull_request_comments":0.0,"total":683.0},{"name":"mariusaeriksen","user":64320,"commits":254.0,"issues":6.0,"commit_comments":0.0,"issue_comments":192.0,"pull_requests":0.0,"pull_request_comments":0.0,"total":452.0},{"name":"MOLLDYVS","user":11213753,"commits":283.0,"issues":0.0,"commit_comments":0.0,"issue_comments":0.0,"pull_requests":0.0,"pull_request_comments":0.0,"total":283.0},{"name":"QHPUWUNQ","user":14263062,"commits":216.0,"issues":0.0,"commit_comments":0.0,"issue..... (truncated for readability)
@sgoggins that's the data I'm getting as well; however, the issue persists. As you can see by this screencap, cacoco
still appears to only have one commit. Based on the GitHub API, I don't think that's correct - for this repository, they should have at least 76. Is this a limitation / shortcoming of my knowledge about what data lies in GHTorrent?
@ccarterlandis : I suspect this is an issue with different intentions behind each commit number. But I think we will work to get to the bottom of this shortly.
Hello @OrkoHunter,
I wanted to follow up on your feedback regarding inconsistent data between data sources. We will be implementing a new architecture in the coming months that will allow users to decide which data sources they prefer when multiple data sources can provide a metric. That way, users that care about historical data (for instance, to see commits that were overwritten with a git push --force
or rebases) more than parity with the repository could use GHTorrent, while users that want to see one-to-one data with the repositories on GitHub can use the GitHub API.
Thank you again for your feedback!
Hi!
I am trying to use this API to get all the contributors of a project as well as the entire Twitter OSS. It is a very valuable metric!
However, as of now, the numbers given in the result does not seem to be accurate. For example,
http://twitter.augurlabs.io/api/unstable/twitter/finagle/contributors tells us that
cacoco
has 1 commits, but https://github.com/twitter/finagle/graphs/contributors says thatcacoco
has 76 commits.http://twitter.augurlabs.io/api/unstable/twitter/finatra/contributors tells us that
cacoco
has 868 commits, but https://github.com/twitter/finatra/graphs/contributors says thatcacoco
has 462 commits.My question is that, what does the
commits
data in the result represent?