Closed ersoykadir closed 1 year ago
The wiki page for automatization of the graphs are here. Please document your findings.
Initial script for parsing is added to codespace. After a meeting with @codingAku to review and refine it, we can proceed with building trace links using the #11's findings. Script basically acquires data through github graphQL API and parses it into a simple node class structure. We need to discuss the details of this node class.
Requirement headers ignored for now, should be removed, redundant
Number of dots system might fail if the req numbers are not written properly
Check especially the case
1.1.2.9. Followers, Follows, My Events, Interest Areas,
1.1.2.10. Interests and Knowledge
**1.1.2.10.1** Users shall identify their interest areas
**1.1.2.10.2** Users shall display their interest areas in their profile pages
Since last 2 reqs missing ending dot(1.1.2.10.1
→1.1.2.10.1.
), the method fails!
How do we search with multiple keywords?
Users shall be able to delete or edit their notes.
Candidate multiple keyword system:
We have combined keyword extractor and parsing results.
After parsing, we created node objects with id and text fields. So we used keyword extractor on text field of requirement nodes. After acquiring a list of keywords for the requirement, we searched each on the existing issues, saving the issue nodes that have matching keywords to a set.
At first, we just merged the sets of found issue nodes for each keyword, resulting in the related issues for a requirement.
We aimed to decrease the noise by the candidate multiple keyword check system, described before. We chose the threshold for the length of matched issues list as 10. We decided to prune the keywords that have more than 10 matching issues. The pruning is done by traversing the matching issues and removing the issues that match only that frequent keyword.
Still some of the requirements that have weird keywords extracted get problematic matchings. Keyword extraction being successful seems very critic.
We are doing search as for each keyword, look for issues!
We can turn this to, for each issue, look for matching keywords
. Saving the number of matching keywords for each issue, we can extract the issues that have lots of matching
Per issue, these metrics can be helpful.
Essentially, using tf-idf techniques.
I will provide a summary on parsing and node structures as well as the keyword search system on wiki, after trying couple more things for reducing the noise tomorrow.
For now, we produce a textual results for trace links.
First line consists of the requirement number
and requirement itself
.
Next line, the dictionary of extracted keywords and the number of issues that are matched with them.
The rest of the lines are the issues that are matched with this requirement, containing issue number
and issue title
Results of first trials can be seen in repo under trace results.
Hello. I believe with the initial results on the repo, we can close the issue and continue working on the improvements. I didn't see any syntax error in Wiki as well. Thank you.
Issue Description
We have manually built, graphed and analyzed traces. Now, we need to automate the link and graph creation. So we will create a data structure for software artifacts to store them. Then we will proceed with building traces between the artifacts automatically, via keyword-based and semantic matching.
Step Details
Steps that will be performed:
Final Actions
Document the results on wiki.
Deadline of the Issue
03.04.2023 @25.59