[Q] Temporal Saliency Rescaling (TSR) - unintuitive results

Victordmz commented 1 year ago

I am using TSR with IG on my multivariate time series dataset that contains padding, since instances are of unequal length. The model that I used is XceptionTime from the tsai library. Testing on a sample from the test set, there is a very big discrepancy between the result with and without TSR (just changed the TSR parameter in the Python interface). What could be a hypothesis for this behaviour? Is this the result of the implementation or just how TSR works? Surely it is not correct that almost all of the attribution goes to the padding only.

IG with TSR:

IG without TSR:

My current version of TSInterpret is 0.3.2. The call is int_mod = TSR(xceptiontime_model, X.shape[-1], X.shape[-2], method='IG', mode='feat', device='cuda') and int_mod.explain(item,labels=label,TSR = True) or int_mod.explain(item,labels=label,TSR = false)

JHoelli commented 1 year ago

Hi ,

yes it is indeed quite interesting that this is happening. The code implementation is consistent with: https://arxiv.org/pdf/2010.13924.pdf Algorithm 1.

Given: input X, a baseline interpretation method R(.)
Output: TSR interpretation method RT SR(.)
for t ← 0 to T do
   Mask all features at time t: X:,t = 0, otherwise X = X;
   Compute Time-Relevance Score ∆time
   ∆time t = ∑ |Ri,t(X) − Ri,t(X_masked)|;
for t ← 0 to T do
   for i ← 0 to N do
      if ∆time > α then
         Mask feature i at time t: Xi,: = 0, otherwise X = X;
         Compute Feature-Relevance Score ∆feature
         ∆feature i = ∑ |Ri,t(X) − Ri,t(X_masked)|;
Compute (time,feature) importance score TSR = ∆feature × ∆time ;

Thereby the input instance x is iteratively masked (first on the time domain, than on the feature domain) and the masked input instances are fed back into integrated gradient to calculate the impact of masking a feature by sum(IG(x)-IG(x_masked)).

What I could imagine becoming an issue is the masking of the data with zeros. Depending on your data, masking with zeroes does not essential lead to the effect of no information.

You do not happen to know the generation process of the data or which time series pattern does not contain information ?

You are probably not able to share the data / code for debugging ? Would be quite interesting.

I could also open up the masking parameter so you can provide custom masking values if that helps ?

What type of padding did you use ? zeros ?

Victordmz commented 1 year ago

Thank you for your useful feedback!

I use zero-padding for my time series data, so this means that when $t$ is a padded timestamp that $\Delta_t^{time}=0$ and so the TSR-value for that element should be 0, right? It doesn't really make sense to me that the padding gets a non-zero attribution from TSR while that should be zero. I think changing the masking parameter would disguise this problem, no?

Notice that my time series data also contains zeros even if it's not padding, but that shouldn't change that. The MTS was constructed from text data using multiple feature extractors, but is standardized.

I'm uploading the data and the model right now and creating a notebook for convenience. Edit: here is the notebook, the links to the data are in it: https://colab.research.google.com/drive/1RnZ0UYX9ZrWDnDqaQ2_wxpgvL8rr92Rz?usp=sharing

JHoelli commented 1 year ago

Hi @Victordmz,

for your case, there were indeed two incorrect assumptions made in the Code of TSInterpret:

The Baseline for Integrated Gradient is set to random by default, I guess in your case zero would be a bette choice
The Masking was done with the first time step and not zero (my bad, I did not remember the default value correctly)

As a solution I opened up some of the parameters, so that custom values can be set, in your case : exp = int_mod.explain(item,labels=label,TSR = True,baseline_Single= np.zeros(item.shape), assignment=0)

You can install the current development code with : pip install https://github.com/fzi-forschungszentrum-informatik/TSInterpret/archive/refs/heads/TSR.zip

It worked for me locally, however I was not able to replicate it to the Colab (I am also no that familiar with Colab). Please let me know if you have issues. If everything works, feel free to close the issue.

My plan is to merge the code into master, tomorrow and publish a new pypi release 0.3.3 end of next week.

Victordmz commented 1 year ago

Hello @JHoelli,

Thank you for your fast feedback! I did not verify the baseline of IG indeed, thank you for pointing that out. The masking correction also has its effect on the output indeed, the results make much more sense.

This is the result for IG with TSR now:

I will close this issue. Thank you very much again!

fzi-forschungszentrum-informatik / TSInterpret

[Q] Temporal Saliency Rescaling (TSR) - unintuitive results #33