MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
MIT License
737 stars 106 forks source link

Feature/etm keyedvectors embeddings input file #36

Closed lfmatosm closed 3 years ago

lfmatosm commented 3 years ago

Resolves #32, #37.

Adds support for original-formatted word2vec and gensim.models.KeyedVectors-formatted embeddings files for ETM model training. Updates some details on the CONTRIBUTING section of the documentation.

lfmatosm commented 3 years ago

Thank you for your work! :) I just added a few comments to address before merging

Thanks for the feedback, @silviatti! I've added docstrings for the changed methods, at your request.

lfmatosm commented 3 years ago

Hi @silviatti . Updated the PR with a commit adding support for headerless original word2vec embeddings file in textual format. This feature provides support for the Dieng's pretrained embeddings found here, as requested by cayaluke's comment on #32. Previously, my PR supported only regular files including the header.