[ ] Create a causal graph using all training data and get the insights (this will be considered the ground truth)
[ ] Create new causal graphs using increasing fractions of the data and compare with the ground truth graph
The comparison can be done with a Jaccard Similarity Index, measuring the intersection and union of the graph edges
[ ] After reaching a stable causal graph, select only variables that point directly to the target variable
[ ] Train one model using all variables and another using only the variables selected by the graph
[ ] Measure how much each of the models overfit the hold-out set created in step 1.