How to directly use word embedding from the pre-trained LM during training and inference?

Pay20Y / SEED

164 stars 40 forks source link

How to directly use word embedding from the pre-trained LM during training and inference? #7

Open Fishersponge opened 4 years ago

Fishersponge commented 4 years ago

If I have a 'train.mdb', how can I use the fasttext pre-train model cc.en.300.bin? I see nothing about fasttext in your ./models and trainers.py and main.py. Waiting for your answer, thanks~

Pay20Y commented 4 years ago

Hi, please refer to create_all_synth_lmdb.py and modify the dataloader accordingly.

Ma01180724 commented 4 years ago

hello，i have a same question，we have the datasets nips2014 and cvpr 2016 （lmdb），how can i use the fasttext pre-train model？can you solve it？ thanks

Ma01180724 commented 4 years ago

@Pay20Y ，could you give me datasets that you had prepared？

Pay20Y commented 4 years ago

@Ma01180724 Hi, I'm really sorry that I can't directly share the training datasets with you because of the large storage. There are two ways to get the training datasets by yourself. First, You can modify the create_all_synth_lmdb.py that load the labels from MJ and ST then generates the new LMDB datasets with embedding labels. Second, as mentioned before, you can modify the dataloader and generate the according word embedding from the recognition label during the training process.