EoinKenny / AAAI-2021

Code for our paper
12 stars 2 forks source link

Hurdle model and occasional 0 vectors #5

Closed jasminkareem closed 2 months ago

jasminkareem commented 2 months ago

I am trying to reproduce your method with another neural network. Sorry if this question is trivial, I haven't worked with Hurdle models before. An issue I keep returning to is that of the Hurdle model. My network has a test accuracy of about 98%, however, when passing the activations to the hurdle model 1024 times (my last layer is of size (1024,10), i.e. len(query_features)), After some loops, it hits a problem when all values are 0. Most of my activations are not sparse. In the file piece_hurdle_model.py, all values that are 0 are filtered out and if all values are 0 for one of these 1024 times you run the for loop, you are left with an empty list, leading to an error. My question is, is the constraint of having no 0 vectors intrinsic to the hurdle model? Can I change this constraint to allow for 0s? Or is there a problem with my model?

I have attached a screenshot of the section of the paper that this relates to. I'm not sure how to deal with this non-zero requirement from reading this section. Thank you for your help!

image
EoinKenny commented 2 months ago

Hey thanks for the message, I would like to try help.

In your final layer that you are modelling are you using ReLU activations? The hurdle model I used was to model a ReLU distribution where everything is >= 0

If your data follows a different distribution, then the hurdle model won't be necessary.

The hurdle model is there to model (1) will the neuron activate or not, and (2), if it does activate, what was the probability of that activated value?

Does that make sense? The reason my formulation was kind of complicated was because modelling a probability distribution for ReLU is a bit complex and required the hurdle model. But, if it's just a range of values [-inf, +inf] then you could just model a much simpler distribution like e.g. a normal one if it follows a bell curve etc.

Does that make sense?

jasminkareem commented 2 months ago

Thanks for responding so quickly and for the help! That does make sense and thanks for clarifying that. My network does use ReLU activations in the final layer, so then using the hurdle model is necessary. The 1d tensor data that the hurdle model takes as input is also >= 0. The problem is when this tensor is all 0, which can occur for a number of reasons. According to your explanation, in the case where this 1d tensor is all 0s, meaning that the neuron does not activate, does this mean that the corresponding probability should be 0?

Currently in the code, this case causes an error through the line self.filtered_data = self.data[self.data != 0].

EoinKenny commented 2 months ago

Hi, yes if all your data for a specific neuron is 0 in your training data, then the probability of that neuron firing would indeed be 0, and the probability of it not firing would be 1. (since it's frequentist stats)

So, I guess in practice, that feature should never change when making a counterfactual or semifactual.

Is that enough information for you to proceed with your work? I am sorry the code is not more robust for that, I guess I didn't encounter that in my work here and never coded up a solution.

I suppose a reasonable solution would be to assign a tiny probability of activation (< 1%), and assume a prior normal distribution for the feature? That way the pipeline should all work, all you'd have to do is augment your dataset a bit to add that,

Again, sorry it's not built for that situation, I could try to fix it now, but if you're in a rush you can probably implement a fix in your code much faster than I could fix the repo.

Or maybe your could just add this in

try:    
    self.filtered_data = self.data[self.data != 0]
except:
    self.filtered_data = [...some random data...]

I think that because the Bernoulli distribution has 0 probability for it activating, it may naturally fix any issues later on with calculations like

    def __ppf_upper_sig_value(self):
        return self.rv.ppf(0.999, *self.params) * self.bern_param

So the added data won't even be used to calculate probabilities (I think)

jasminkareem commented 2 months ago

Thank you for your answer! This helps a lot and I think it should be straightforward to implement myself.