In the "Scene Element Embedding" section of your paper, it is mentioned that attention-based pooling is used to convert point-level embeddings to polygon-level.
In your implementation within the qcnet_map_encoder.py, it appears that attention pooling based on a graph neural network is employed. How does this differ from the traditional multi-head attention pooling layer? Where did you get the inspiration for this approach? I look forward to your response.
In the "Scene Element Embedding" section of your paper, it is mentioned that attention-based pooling is used to convert point-level embeddings to polygon-level. In your implementation within the qcnet_map_encoder.py, it appears that attention pooling based on a graph neural network is employed. How does this differ from the traditional multi-head attention pooling layer? Where did you get the inspiration for this approach? I look forward to your response.