ai-se / ML-assisted-SLR

Automated Systematic Literature Review
2 stars 2 forks source link

Generate Data (In progress) #35

Open azhe825 opened 8 years ago

azhe825 commented 8 years ago

Generate synthetic data

Extract data from SLR

azhe825 commented 8 years ago

Target

A systematic literature review of studies on business process modeling quality 2015, IST

Search string:

TI = (quality OR eval OR consistenc OR maintainability OR understand OR completeness OR comprehensi OR testability OR defect OR pitfall OR deficienc OR error OR mistake OR problem OR effectiveness OR complexity OR readability OR metric OR measur OR efficienc OR validat OR layout OR guideline OR flexibility OR recommendation OR correctn) AND TI = (process) AND TI = (model OR representation OR diagram)

Refined by: Research Areas = (COMPUTER SCIENCE OR BUSINESS ECONOMICS) AND Document Types = (MEETING OR ARTICLE) Timespan = 2000–2013. Search language = Auto.

Overlaps with defect prediction: "error", "defect", "correctn", "COMPUTER SCIENCE"

72 inclusions

Mentioned in this paper:

3.3. Search and selection approach

Given the broad nature of the domain of research, finding all relevant papers by manually searching through conferences and journals would be very time consuming. We therefore opted to start the search process with an automated search. We subsequently completed the set of papers through (1) a manual search by scanning conferences proceedings, DBLP and personal pages of several well-known authors in the business process modeling quality research area, and (2) a reference search. We limited the search to electronic collections only and solely considered journals, conference proceedings, and workshop proceedings that were peer reviewed. Fig. 1 shows a full overview of the search process.

The collection of 1061 papers obtained by the automated search was reduced by applying a first filter on title and abstract, resulting in a set of 173 papers. As a result of the manual search process we obtained an additional set of 56 potentially relevant research papers. 29 of these papers were not included in the results of the automated search. 15 papers out of these 29 papers are not included in the WoS and could indeed never have been identified through the automated search.

After bringing together the papers returned by the automated search and those obtained by the manual search, we applied the selection criteria on the full papers. This yielded 62 papers to be included in the final paper set. For the references search, we read these 62 papers in detail and investigated their references in search for more relevant papers. This references search yielded 69 potentially relevant papers. After a detailed reading of these papers, the inclusion/exclusion criteria were also applied to them. This resulted in 15 papers to include. Finally, we merged the two sets. At the end of this stage, we obtained 77 studies. Of these 77 studies 2 were excluded for being considered duplicate publications of the same results (i.e. [31] for paper 6 and [32] for paper 24 in Appendix B). For duplicate studies we kept the most complete and recent publication as recommended by [26] and [33]. Also, we excluded 3 more papers for not being published in peer reviewed conference proceedings or journals. This yields 72 papers as final set for the SLR.

At the end of the search process, we checked the quality of the search string that yielded the initial set of 1061 papers. This was done by checking whether all the papers of the most frequently occurring author appeared in the list, and, if not, whether there is a logical explanation for this. According to the WoS search engine, J. Mendling is the author with the highest number of publications in the set of 1061 papers. At the end of the search process, he appeared as (co-)author of 28 out of the final set of 72 papers. Therefore, he is both at the start and at the end of the process the most frequently publishing author in this domain. Out of the 28 papers (co-)authored by J. Mendling, 3 are not indexed in the WoS. The automated search yields 20 of the 25 indexed papers, a recall of 80%. The 5 papers that were not found have titles that do not clearly refer to business process model quality keywords, e.g. “Refactoring large process model repositories”. We investigated whether the absence of the right keywords in the title could be overcome by searching on topic rather than on title. However, this is not an option as a search on topic rather than on title resulted in more than 200,000 papers.

We are aware that because of the high number of papers published in the domain of business process modeling, it is practically impossible not to miss some sources. Nevertheless, given the high recall for the author with the largest number of publications, we gathered further confidence that the performed automated search in combination with the manual search and the reference search can be considered as sufficiently complete or at least sufficiently representative.

azhe825 commented 8 years ago

Vulnerability prediction

azhe825 commented 8 years ago

Software Analytics to Software Practice: A Systematic Literature Review

Only 19 relevant