Closed aredelmeier closed 2 years ago
Hey, thanks for letting us know about this issue.
The reason for that IndexError is because the test_factual
does not get ordered (such that the label is at the end) inside get_counterfactuals
. For now you can quickly fix this by adding the following inside revise.get_counterfactuals
:
def get_counterfactuals(self, factuals: pd.DataFrame) -> pd.DataFrame:
device = "cuda" if torch.cuda.is_available() else "cpu"
# Add this
factuals = pd.concat(
[
self._mlmodel.get_ordered_features(factuals),
factuals[self._mlmodel.data.target],
],
axis=1,
)
This should fix the IndexError. We will add this fix to CARLA, but that can take a couple of days. Also please let us know if it's still not working or if anything is unclear.
When testing with your code snippet I also encountered that the success rate was very low. I think that in the training params for the ml_model the epochs
should be increased to 20. That seems to improve the accuracy from about 75% to 85%.
ml_model.train(
learning_rate=0.002,
epochs=20,
batch_size=1024,
hidden_size=[18, 9, 3],
force_train=True # don't forget this or else it might load an older model from disk
)
Hi! Thanks for the quick reply and thanks for spotting that for the ml_model. It works perfectly fine now.
Do you think that code addition is needed for the other recourse methods? So far, I haven't gotten error messages for the others but it might be smart to do so anyway?
Annabelle
Yeah, maybe. I'm quite sure that always ordering before anything else won't break anything. So even though it might add some redundant code, it might be a good idea to play it safe.
In principle you want a recourse method not to depend on the exact order of the columns at that point in the code IMO, e.g. the column names are available most of the time. For REVISE for example the main issue is that the target column is in the factual dataframe. However that column isn't used, and thus does not exist later in the code. Ordering of the features (at the start, there is ordering going on e.g. when predicting) should not be needed when the target column is removed in the first place. So I'm a little bit reluctant to just put ordering everywhere because it might not really fix the underlying problem. But I'll keep it in mind when fixing problem, also for other methods.
Turns out all of the recourse methods were already using the ordering, except REVISE and CRUD. So my above comment is kinda wrong, and I apparently forgot about the code I wrote. Ordering for those two methods will get added very soon, and then all the methods will use feature ordering at the start of get_counterfactuals
Hi! Great, thanks for clearing that up.
Hope it's ok if I ask another question here:
I get quite different results each time I run the methods (e.g., sometimes I get no counterfactuals when I run REVISE and sometimes I get that all observations get a counterfactual - with the same test sample). Is there a way to set a seed before each method is run so that I can guarantee the same results? Thanks!
Hey, I think the randomness is due to retraining the autoencoder. You can set the autoencoder "train" parameter to False, and then it should load an autoencoder that you trained before. At least it seemed that give identical results on my machine.
Alternatively you can set all the seeds (excluding tensorflow here) like so
torch.manual_seed(0)
random.seed(0)
numpy.random.seed(0)
That works globally I believe. This also gave identical results between two runs for me.
I hope that helps!
That works great! Thanks again for all your help.
Hi! Thanks for all the your work implementing a large range of counterfactual explanation methods!
I'm having problems running the Revise method and was hoping someone could point me in the right direction. Here is my code (I'm using the CARLA package master branch pushed April 20th, 2022):
The last line of code gives me an IndexError: index 13 is out of bounds for dimension 1 with size 13. Any idea why this may be?
Thanks, Annabelle