Closed stephantul closed 4 years ago
Hi,
Thanks for this suggestion! I've added two arguments to BPEmb.__init__
:
model_file: ``Path'', optional (default = None)
Path to a custom SentencePiece model file.
emb_file: ``Path'', optional (default = None)
Path to a custom embedding file. Supported formats are Word2Vec
plain text and GenSim binary.
Can you checkout the latest commit and let me know if this feature works for you? If yes, I'll update the pypi package as well.
Yep, seems to work! I did some tests and everything gives the correct results. Thanks for the swift reply
Can you add a .spm vocabulary to enlarge BPemb's multi-language model? For instance can you add Lebanese vocabulary in addition to the already available MSA, Egyptian and Aramaic?
Hi,
First of all, thanks for the great package. Currently, the only way to use my own models with bpemb is to first load another model, and then assign the
.spm
and.emb
attributes manually. This is a bit unwieldy.I am interested in adding a subclass of
BPEmb
that overrides the__init__
ofBPEmb
and simply accepts paths to anspm
andemb
model/file, from which the other attributes (e.g. size/vs) are derived. Is this something you would accept as a PR? Do you see any problems with this approach?Thanks! Stéphan