DAMO-DI-ML / KDD2023-DCdetector

153 stars 16 forks source link

Can't Seem to find Learnable parameter matrix for attetion layers #3

Closed SeungHunHan11 closed 12 months ago

SeungHunHan11 commented 1 year ago

Hi, I'm impressed by your work and now trying get a hold of implementation code.

It is mentioned in the papaer that a softmax attention score is multiplied with learnable parameter matrix Equation (3)

eq

However, reviewing related model/attn.py file below, I couldn't find any implementation of such operation.

https://github.com/DAMO-DI-ML/KDD2023-DCdetector/blob/bb2c19f137261f1e32a3429cd91a58abde6c0170/model/attn.py#L22-L48

As of now, as far as I understand, softmax attention score is directly used for following upsampling process without any operation with learnable parameter matrix.

I would be grateful for more details.

Thank you for the great work!

yyysjz1997 commented 1 year ago

Hi, I'm impressed by your work and now trying get a hold of implementation code.

It is mentioned in the papaer that a softmax attention score is multiplied with learnable parameter matrix Equation (3)

eq

However, reviewing related model/attn.py file below, I couldn't find any implementation of such operation.

https://github.com/DAMO-DI-ML/KDD2023-DCdetector/blob/bb2c19f137261f1e32a3429cd91a58abde6c0170/model/attn.py#L22-L48

As of now, as far as I understand, softmax attention score is directly used for following upsampling process without any operation with learnable parameter matrix.

I would be grateful for more details.

Thank you for the great work!

Many thanks for your issue and attention to our work. There are related codes about attention operation in model/attn.py class AttentionLayer (Line 52-96). In 'def init', there is a description of the learnable parameters for the attention layer.

As for your another issue, our representations are the attention scores after up-sampling operation. Hope you find this helpful.

SeungHunHan11 commented 1 year ago

Hi, I'm impressed by your work and now trying get a hold of implementation code. It is mentioned in the papaer that a softmax attention score is multiplied with learnable parameter matrix Equation (3) eq However, reviewing related model/attn.py file below, I couldn't find any implementation of such operation. https://github.com/DAMO-DI-ML/KDD2023-DCdetector/blob/bb2c19f137261f1e32a3429cd91a58abde6c0170/model/attn.py#L22-L48

As of now, as far as I understand, softmax attention score is directly used for following upsampling process without any operation with learnable parameter matrix. I would be grateful for more details. Thank you for the great work!

Many thanks for your issue and attention to our work. There are related codes about attention operation in model/attn.py class AttentionLayer (Line 52-96). In 'def init', there is a description of the learnable parameters for the attention layer.

As for your another issue, our representations are the attention scores after up-sampling operation. Hope you find this helpful.

Thank you for a quick reply!

Yes I do understand that the representation used for similarity calculation is the attention scores after up-sampling operation.

Nonetheless, my inquiry was that learnable parameter matrix W in patch-wise and in-patch attention modules. Under current code implementation, softmax attention score is directly fed into up-sampling operation, which is not consistent with the equation provided in the original paper.

In addition, as pointed out in "Towards a Rigorous Evaluation of Time-series Anomaly Detection" (Kim et al. 2022), point adjusted metrics tend to overestimate model capacity. Do you have any plan to add point-wise metric scores in the future?

Again, thank you for all the work!

yyysjz1997 commented 12 months ago

Hi, I'm impressed by your work and now trying get a hold of implementation code. It is mentioned in the papaer that a softmax attention score is multiplied with learnable parameter matrix Equation (3) eq However, reviewing related model/attn.py file below, I couldn't find any implementation of such operation. https://github.com/DAMO-DI-ML/KDD2023-DCdetector/blob/bb2c19f137261f1e32a3429cd91a58abde6c0170/model/attn.py#L22-L48

As of now, as far as I understand, softmax attention score is directly used for following upsampling process without any operation with learnable parameter matrix. I would be grateful for more details. Thank you for the great work!

Many thanks for your issue and attention to our work. There are related codes about attention operation in model/attn.py class AttentionLayer (Line 52-96). In 'def init', there is a description of the learnable parameters for the attention layer. As for your another issue, our representations are the attention scores after up-sampling operation. Hope you find this helpful.

Thank you for a quick reply!

Yes I do understand that the representation used for similarity calculation is the attention scores after up-sampling operation.

Nonetheless, my inquiry was that learnable parameter matrix W in patch-wise and in-patch attention modules. Under current code implementation, softmax attention score is directly fed into up-sampling operation, which is not consistent with the equation provided in the original paper.

In addition, as pointed out in "Towards a Rigorous Evaluation of Time-series Anomaly Detection" (Kim et al. 2022), point adjusted metrics tend to overestimate model capacity. Do you have any plan to add point-wise metric scores in the future?

Again, thank you for all the work!

Thanks for your issues. As for the learnable parameter matrix W, the implementation code is the same as in the equation, although in a different order. Because the upsampling operation does not affect the calculation of equation (3), it just repeats the features present from different perspectives and resizes them to the same size for suitable comparison. Also, for different attention heads, they are independent.

As for the metric, I would like to try it and see what happens if we add point-wise metric scores. BTW, we have also added different novel evaluation metrics such as VUS and affiliation-metrics in this paper, which can be found and implemented at there (including other common evaluation metrics). Thanks, hope you find these helpful~

SeungHunHan11 commented 12 months ago

Hi, I'm impressed by your work and now trying get a hold of implementation code. It is mentioned in the papaer that a softmax attention score is multiplied with learnable parameter matrix Equation (3) eq However, reviewing related model/attn.py file below, I couldn't find any implementation of such operation. https://github.com/DAMO-DI-ML/KDD2023-DCdetector/blob/bb2c19f137261f1e32a3429cd91a58abde6c0170/model/attn.py#L22-L48

As of now, as far as I understand, softmax attention score is directly used for following upsampling process without any operation with learnable parameter matrix. I would be grateful for more details. Thank you for the great work!

Many thanks for your issue and attention to our work. There are related codes about attention operation in model/attn.py class AttentionLayer (Line 52-96). In 'def init', there is a description of the learnable parameters for the attention layer. As for your another issue, our representations are the attention scores after up-sampling operation. Hope you find this helpful.

Thank you for a quick reply! Yes I do understand that the representation used for similarity calculation is the attention scores after up-sampling operation. Nonetheless, my inquiry was that learnable parameter matrix W in patch-wise and in-patch attention modules. Under current code implementation, softmax attention score is directly fed into up-sampling operation, which is not consistent with the equation provided in the original paper. In addition, as pointed out in "Towards a Rigorous Evaluation of Time-series Anomaly Detection" (Kim et al. 2022), point adjusted metrics tend to overestimate model capacity. Do you have any plan to add point-wise metric scores in the future? Again, thank you for all the work!

Thanks for your issues. As for the learnable parameter matrix W, the implementation code is the same as in the equation, although in a different order. Because the upsampling operation does not affect the calculation of equation (3), it just repeats the features present from different perspectives and resizes them to the same size for suitable comparison. Also, for different attention heads, they are independent.

As for the metric, I would like to try it and see what happens if we add point-wise metric scores. BTW, we have also added different novel evaluation metrics such as VUS and affiliation-metrics in this paper, which can be found and implemented at there (including other common evaluation metrics). Thanks, hope you find these helpful~

Appreciate your reply!