coinse / GHRB

A Repository of Real, Recent Java Bugs
13 stars 2 forks source link

Mistmatch between current version and paper #1

Closed andre15silva closed 10 months ago

andre15silva commented 10 months ago

Hi,

First of all, thanks a lot for providing a cool new dataset :)

I found that the number of bugs contained in the repo is different from the one reported in the paper (107 bugs from 17 repos vs. 76 bugs from 15 repos). Is this a new update that isn't reflected on the paper, or some are the additional bugs not supposed to be there?

smkang96 commented 10 months ago

Hi, thanks for your interest in our work!

The arXiv paper is accurate. I'm not certain where you got the 107 bug count, but if you look at some of the diffs in the data/prod_diff directory, they are empty (e.g. OpenAPITools_openapi-generator-13580.diff). These are bugs that were somewhat hastily excluded, as they were not pure Java bugs, or did not satisfy the September 2021 cutoff criterion.

I acknowledge it is not the best design, and may be confusing. I'll discuss this with the first author and see what we can do to clarify things. Once again, thanks for bringing this issue up!

andre15silva commented 10 months ago

Ah, I see.

I found them by both using the cli.py file as well as reading the json files directly, which does indeed cause confusion.

Thanks for the explanation :)

smkang96 commented 10 months ago

I've discussed with the first author, and he has made changes so that the total number of bugs, when using cli.py, is now 76, with 16 repositories. I have personally checked this as well. Thanks to your issue, we also found some inaccuracies in our paper, and we have updated our technical report (the fixed version is slated to be published on arXiv on Fri, 3 Nov 2023 00:00:00 GMT).

Once again, thanks for bringing this up. I'll close this issue for now, but if you have any other concerns, please let us know!

andre15silva commented 10 months ago

Thanks a lot @smkang96!

I confirm that I also get the same number of bugs and repos.

I'll let you know if I have more issues.