Closed ersoykadir closed 1 year ago
I looked at the repos of some of the projects on github like apache/maven, apache/tomcat or eclipse/sumo, keras, godot etc.
First impressions:
Issues types
"There are people who test and use the software, and submit bug reports (issues). Fixing them becomes the requirement for the project of, well, fixing them. Sometimes a change is done to address multiple bugs, in essence collecting those issues as requirements for a single project."
A comment on this discussion seemed very helpful to me.
The issues, in the issue tracker - be them bug reports or proposals for new features - are requests for change. They bring new requirements (or remind of old ones in the case of regressions). To implement those changes, we do projects (with design, and testings, programming, etc).
Each one of those projects has requirements.
- Bug Reports.
- There are people who test and use the software, and submit bug reports (issues). Fixing them becomes the requirement for the project of, well, fixing them. Sometimes a change is done to address multiple bugs, in essence collecting those issues as requirements for a single project.
- Road-maps.
- It is common that a lead programmer will have a road-map of features to add. The core developers will work towards the what is laid out in road-map. Thus, the road-map are requirements.
- Proposals
- You will also find the issue system co-opted for feature requests. In fact, the practice of writing proposals for change as issues is becoming widespread. These proposals are usually more detailed than a road-map. You can consider them RFC documents.
- Pull requests.
- Milestones.
- Furthermore, you will find milestones. Usually corresponding to major releases. For a milestone a set of issues (bugs to fix, or features to add) is selected. You may consider all those as requirements for the next release.
- Tests.
- For some folks following TDD, the tests are the requirements. Even for those of us who don't follow TDD, having the code pass all tests is often a requirement for a new release. Thus, those are requirements expressed in the very formal language of actually executable code.
We had discussed some tools that might be helpful to us on nlp and info retrieval techniques . I have taken a look at them.
WordNet, looked more like a dictionary.
NLTK, has a python library option. Looks like there is a lot of tools for nlp, can be looked if needed. Couldn't find anything when directly searched for semantic similarity.
WikiData,
Spacy
Has various features. Allows training on custom data?
Word vectors and similarities:
Similarity is determined by comparing word vectors or “word embeddings”, multi-dimensional meaning representations of a word.
Problem is to compare sentences spacy takes average of the token vectors. means that the vector of multiple tokens is insensitive to the order of the words.
Tested with a couple of requirements from BounSWE, wasn’t looking very sharp, probably due to explanation above.
It is suggested that sense2vec and prodigy libraries can improve the accuracy of similarity, but I haven’t tried them yet.
NLPCloud
Observation on repos:
Most do not have requirements documents or specifications, although some have noticeable artifacts to set the objectives and functions to implement.
As the example above shows, the title and the "WHAT PROBLEM IS THIS SOLVING" section could be linked to issues to trace the development process.
Here are the first two issues with the most similar results on NLP Cloud Semantic Similarity Engine with 74.3% and 79.4% respectively.
Here is another example of LyricsKing App Screenshots, which are basically design images for implementation. Although they would require image classification to detect text and then run a similarity search on them.
On the other hand, Bounswe repository has many projects which include requirements and glossary section, which creates an environment to run severe similarity techniques on requirements and issues, pull requests, commit messages.
The observations of this issue have been discussed in Meeting 3. The final action left is document them on Wiki.
Wiki page is created and linked to sidebar with my contributions documented. The Wiki page needs @ersoykadir 's findings to be documented to close the issue.
Findings are documented on the Wiki page. Additional to the my comment above, I also added an open dataset resource that I have seen in one of the papers. It might be useful in the future for our evaluation purposes. @codingAku can close the issue after review.
Everything looks good. First milestone finished. Thank you.
Issue Description
The aim for this week is to look for repos on github, examine them with the perspective of newly learned methods.
wordnet
,nltk
,wikidata
,spacy
are some of the tools that we will play with, to test out info retrieval and semantic relation methods.Step Details
Steps that will be performed:
Final Actions
Ideas and findings must be documented on wiki.
Deadline of the Issue
11:00- 13.03.2023