WM-SEMERU / ds4se

Data Science for Software Engineering (ds4se) is an academic initiative to perform exploratory and causal inference analysis on software engineering artifacts and metadata. Data Management, Analysis, and Benchmarking for DL and Traceability.
https://wm-csci-435-f19.github.io/ds4se/
Apache License 2.0
7 stars 3 forks source link

Causality library exploration #102

Open scheurich-sarah opened 3 years ago

scheurich-sarah commented 3 years ago

There are a few well-known software packages for studying causal inference in the statistics community, such as SAS and SPSS, unfortunately these platforms can be somewhat esoteric, inconvenient (more LOC to accomplish the same tasks), and costly. Causal inference has been used extensively in fields like medicine and economics to answer "what if" questions and other fields are increasingly looking to causal inference to explain the 'why' behind correlations observed in machine learning. At the intersection of increased interest and convenience, stand libraries such as causality (pypi) and doWhy (Microsoft Research) which offer xxx causal xxx and simplistic integration into existing python-based research environments. In fact, doWhy documentation identifies an end goal to make causal inference as commonplace as typical exploratory data analysis via 'dowhy.api'.

Employ the doWhy library and develop a causal process that leverages counterfactual reasoning to explore the impact of method level or class level granularity and different pre-processing strategies on software traceability.