AbrahamSanders / seq2seq-chatbot

A sequence2sequence chatbot implementation with TensorFlow.
MIT License
99 stars 56 forks source link

embeddings #27

Open matteogabella opened 4 years ago

matteogabella commented 4 years ago

hi Abraham! i ran your algorithm with some series subtitles (in italian), but the result is quite akward... surely due to the small dimension of my dataset (60k lines)... i was wondering, maybe i could try helping the training process with embeddings... i found some embeddings file in italian... but most of it are not in the form you required (TF checkpoints) they come with .m extensions or .npy ect... nothing seems to fit the one your algorithm can process do you think is possibile, in a few lines (i don't want to excessively bother you) to explain to me how to create a brand new embedding checkpoint (from scratch or converting one already built), or tell me where i can check in github projects? thank you!! Matteo

AbrahamSanders commented 4 years ago

Hi @matteogabella,

It can also import embeddings from a flat file - see https://github.com/AbrahamSanders/seq2seq-chatbot/tree/master/seq2seq-chatbot/embeddings/dependency_based

Download the embedding file from there and look at the format - if you can get your embeddings in that format you can easily create a subclass of FlatFileVocabularyImporter to load your words and embeddings. To see an example of what the subclass would look like, see https://github.com/AbrahamSanders/seq2seq-chatbot/blob/master/seq2seq-chatbot/vocabulary_importers/dependency_based_vocabulary_importer.py.

Regards, Abraham