Question on the number of heads used in the analysis

danchern97 / tda4atd

This is an official repository for "Artificial Text Detection via Examining the Topology of Attention Maps" presented at EMNLP 2021 conference.

22 stars 3 forks source link

Thank you for sharing the code. I'm confusing something, I would appreciate if my understanding is correct.

Are you using the all heads output for the analysis? The paper you mentioned 'Roles and Utilization of Attention Heads in Transformer-based Neural Language Models' appears to use only selected heads. But your code seems to use all heads. Is this correct?
After extracting all the features, are they concatenated and used as an input for a single linear binary classifier? If they are concatenated, then the dimension of it would be quite large I guess.

danchern97 / tda4atd