When computing 'e' in the FeatureAttentionLayer, the output of MTAD_GAT 'predictions', 'recons' are all nan, and training is not possible due to the presence of nan.

ylic204 commented 2 years ago

Dear authors,

Thank you for uploading this code. I am a beginner in multivariate time series anomaly detection and this has been very helpful in my research. I have read and understood your code, but the output is always nan when training. And i can be sure that the data input is normal.

Therefore, I output the result of each step in forward() in mtad_gat.py. Then, after the feature_gat() layer of operation, there is a problem.

So I step into feature_gat(), after e = torch.matmul(a_input, self.a).squeeze(3) , some nan appears, as shown in the figure. Then after softmax there are more nans, usually one column is nan.

I wonder how to solve this problem? I also tried to adjust batch_size,look_back, but nothing works.

Environment：

linux
cpu/gpu
torch1.10.0 cpu only/torch1.10.0+cu111

ghost commented 2 years ago

Hi, I believe this is due to the bias not being initialized in both the feature and temporal attention layer. torch.empty() is initialized with trash values which can be NaNs as well.

Try adding zero initialization for bias in both modules, this is at least how it is done in Torch Geometric (see reset_parameters() method).

if self.use_bias:
    self.bias = nn.Parameter(torch.empty(window_size, window_size))
    nn.init.zeros_(self.bias.data)

# or you can do this, which I believe will yield the same result 

if self.use_bias:
    self.bias = nn.Parameter(torch.zeros(window_size, window_size))

Hope this works!

ylic204 commented 2 years ago

It works for me now! Thank you !

ML4ITS / mtad-gat-pytorch

When computing 'e' in the FeatureAttentionLayer, the output of MTAD_GAT 'predictions', 'recons' are all nan, and training is not possible due to the presence of nan. #13