Open LongyuZhang opened 3 years ago
FYI @smlambert @llxia
FYI @avishreekh
Thank you @LongyuZhang!
Collect all open issue contents in related repos, e.g. openjdk-tests/issues
We can use the issues
API for listing issues of a repository (here) provided by GitHub.
After storing all existing issue contents, continuously monitoring and collecting new issues in these repos.
For collecting new issues, we could save the last updated timestamp
when querying for new issues. We could then use this timestamp with the issues
API for fetching new issues next time (It allows to fetch issues created/updated after a certain time using the since
parameter). So we maintain a variable that stores the latest timestamp and use it for new queries.
Please let me know your thoughts on this @LongyuZhang @llxia @smlambert. Thank you!
Talked with @LongyuZhang , below are some of the details:
We should query git repos at an appropriate frequency (every 30 mins?).
[ { "url": "https://api.github.com/repos/octocat/Hello-World/issues/1347",
"repository_url": "https://api.github.com/repos/octocat/Hello-World",
"number": 1347,
"state": "open",
"title": "Found a bug",
"created_at": "2011-04-10T20:09:31Z",
"updated_at": "2014-03-03T18:58:10Z",
"issue_content_path": "/path to the content file/issueContent/<repo name>_<issue#>.txt"
"test_output_path": "/path to the content file/testOutput/<repo name>_<issue#>.txt"
},....
]
since
to limit git query for issues created/updated after a certain time that matches with our query internals.label
to narrow down the search. (i.e., label="test failure") In summary:
Step1: figure out git query using since
Step2: write a query to query git periodically
Step3: filter returned data into issue content and test output and store files in the file system
Step4: store the relationship and data into DB. If an issue is updated, the data in DB should be updated accordingly
Step5: trigger ml model training program to read /path to the content file/testOutput/<repo name>_<issue#>.txt
Thank you for the elaborate discussion @llxia. Please let me know if I can work on this.
Please go ahead. Thanks a lot for working on this!
I was wondering if we could use GitHub webhooks instead of polling using APIs. That way, we will be religiously notified when a new issue is added and we won't have to keep tracking it.
Please let me know your thoughts on this.
Thank you
It is a good idea to use GitHub webhooks
to monitor new issues, but for the initial collection of existing issues, the issue api
may work better. We can try to use them separately for these two purposes if possible. Thanks.
I agree, since we need to query multiple repos, I think git API is more flexible/easy. I think it is a good idea to keep an eye on alternatives (i.e., webhook, github workflow, etc), so we know what are the advantages and disadvantages of using them.
Thank you @LongyuZhang @llxia! I will first try to implement the initial collection of issues using the Issues API
and poll for new issues using the since
parameter. The Webhook integration can be done later if it is found to be a better alternative. I will also look for other alternatives in the meantime.
Please let me know if this sounds like a good strategy to begin with or if any modifications are needed.
Thank you.
Sounds good! Thanks @avishreekh
For now, we are querying git for issues. But please keep in mind, we may not limit to git issues. It could be other bug-tacking systems.
To automate the data collection process for deep AQAtik, we need to investigate and work on the following functions:
Relate Issue: https://github.com/adoptium/aqa-test-tools/issues/355