Closed astariul closed 5 years ago
The embedding table is context-free wordpiece embeddings. These are not particularly useful. They will just be worse versions of what you would get from GloVe/word2vec/FastText etc.
extract_features.py
gives you contextual representations, which are "embeddings" of each token in the context of the sentence. This is what you would want to build a model on. For this, you need to run your full training and test data through extract_features.py
and use the input vector just like you would use an embedding (to handle the 4x, you can just concatenate the 4 vectors for each word).
you need to run your full training and test data through extract_features.py and use the input vector just like you would use an embedding (to handle the 4x, you can just concatenate the 4 vectors for each word).
Oh I see.
I thought extract_features.py
is a script to process the Embeddings and then we can use these wherever we want.
But from what you said, extract_features.py
IS the Embeddings layer.
It makes sense, having Embeddings for each words independently would mean no context.
Thank you very much for your kind and clear explanations.
I thought the feature vectors extracted from BERT represents word embeddings.
So I thought, in order to use these embeddings, one just have to extract it (using
extract_features.py
), then load the weights in an Embedding layer (yes, I'm a Keras person). Then just build whatever we want on the top of this Embedding layer.But it is wrong, isn't it ? Using
extract_features.py
, I got the weights of the last 4 layers, for each words in each sentences fed as input !So instead of having 4 X weights (X being the size of a layer) as I expected, I have 4 X * tokens_used_in_input_file weights !
How do I use the Feature vectors to build on top of BERT a task-specific model architecture ?