WM-SEMERU / ds4se

Data Science for Software Engineering (ds4se) is an academic initiative to perform exploratory and causal inference analysis on software engineering artifacts and metadata. Data Management, Analysis, and Benchmarking for DL and Traceability.
https://wm-csci-435-f19.github.io/ds4se/
Apache License 2.0
7 stars 3 forks source link

Learn how to instantiate the CausalModel class #104

Open scheurich-sarah opened 3 years ago

scheurich-sarah commented 3 years ago

The CausalModel class is the first step in doWhy's causal process. At a minimum, it involves establishing the data set (a pandas dataframe), treatment (a column in the dataframe), outcome (a column in the dataframe) and a causal graph. There are multiple ways to specify a causal graph, but .gml is recommended, so we'll use that for now. There are tools (DAGitty) to help make the graph strings.

Learn the other keyword args one can pass to CausalModel to understand which might apply to our data/question.

103

scheurich-sarah commented 3 years ago

Implemented vanilla causal framework on a built-in doWhy data set. All built in data sets come with structural causal models already encoded. This implementation follows the starter notebook ConfoundingExample on the doWhy website. This starter notebook seemed most relevant as it uses regression and explores data looking for confounders. Documented nuances about the CausalModel class from exploring source code.