WM-SEMERU / ds4se

Data Science for Software Engineering (ds4se) is an academic initiative to perform exploratory and causal inference analysis on software engineering artifacts and metadata. Data Management, Analysis, and Benchmarking for DL and Traceability.
https://wm-csci-435-f19.github.io/ds4se/
Apache License 2.0
7 stars 3 forks source link

do(code): A Causal Inference Framework to Understand and Explain Source Code Properties #95

Open danaderp opened 3 years ago

danaderp commented 3 years ago

Description: The boom of machine learning and deep learning techniques in software engineering has been increasing in the last decade. A need for SE-related data, in specific code data, is also in constant demand since these data are the main source for learning algorithms to operate. Although learning algorithms are relevant to extract patterns from unstructured SE-data, the effectiveness of these algorithms is poorly understood and explained. The reason behind this problem is that software researchers mainly focus their attention to evaluate observational scenarios and they do not contemplate possible interventions in the data that might influence the outcome of the learning algorithm. These types of interventions enable something that in causal inference is known as counterfactual explanations. In our case, we are particularly interested in code interventions given a set of properties (e.g., complexity, size, or entropy). Often, SE Research questions about source code are not entirely descriptive or predictive questions. In some scenarios, these questions try to establish a causal relationship. Let’s look at the following example of a Research Question according to the function:

The purpose of this study is to create a library that allows software researchers to evaluate the causal effect from one code property (i.e., code size) to another code property (i.e., # of bugs).

Project Goal

Project Requirements

Recommended Readings