I have a question about the Sentence_Embedding model forward implementation.
Why is torch max applied after the first Fully Conbected layer? Is this better than doing it before and averaging all the word embeddings of a sentence before the FC Layers?
I have a question about the Sentence_Embedding model forward implementation.
Why is torch max applied after the first Fully Conbected layer? Is this better than doing it before and averaging all the word embeddings of a sentence before the FC Layers?
Thanks for the clarification