Closed CapPow closed 1 year ago
@daemon, Thanks for the great work! Looking at both hooked_attentions, am I correct in thinking that each layer overwrites itself at each timestep? It looks like each layer's key would be non-unique at each timestep.
Ah no that’s not an overwrite, it’s a sum. I should probably rename the update method.
https://github.com/castorini/daam/blob/cb5d2d2b19b77eb9654bd6ac3dfde92a25a02541/daam/heatmap.py#L153
Right on, thanks so much!
@daemon, Thanks for the great work! Looking at both hooked_attentions, am I correct in thinking that each layer overwrites itself at each timestep? It looks like each layer's key would be non-unique at each timestep.