Open pidahbus opened 4 years ago
Hi @pidahbus ,
this is not a complete solution for your issue, but you can have a look at this script from @LysandreJik, that retrieves embeddings from the original TF model:
https://gist.github.com/LysandreJik/db4c948f6b4483960de5cbac598ad4ed
You just need to adjust the input data with real ids from the BERT vocab incl. [CLS]
and [SEP]
as special tokens around and then you can pass this id sequence into the model.
(Aim of the script is to test the difference between the original TF model from the official ELECTRA implementation and the upcoming PyTorch version in Hugging Face's Transformers library).
Hi @stefan-it, I went through the code and made some necessary changes in the .py files to extract the generator/discriminator embeddings. Would you want me to send you the pull request regarding this?
hi @pidahbus , could you share your changes about extracting the generator/discriminator embeddings. in your github? Thanks
Hi @pidahbus! Could you please send me the pull request regarding extracting the generator/discriminator embeddings? Thanks!
Hi, following the commands, I pre-trained electra-small on my dataset. After pre-training I want the learned embeddings which I need to use on some other complicated downstream tasks. Could you please help me with how to extract the word embeddings after pre-training?