Closed trideeprath closed 3 years ago
Thanks for this suggestion! I added the following argument a while ago but for some reason didn't reply here:
segmentation_only: ``bool'', optional (default = False)
If set to True, only the SentencePiece subword segmentation
model will be loaded. Use this flag if you do not need the
subword embeddings.
So you can load the BPE model only like this:
bpemb_en = BPEmb(lang="en", dim=50, segmentation_only=True)
While initialization two models bpe_model and w2v model are downloaded.
bpemb_en = BPEmb(lang="en", dim=50)
In some cases, the w2v model is not required but only the tokenization is required. For example, when training a text classifier with training the embeddings, now the w2v model in bpemb is not required but tokenization is required during inference. Is there a way the bpemb is initialized to be used only for encode method without the need to download/load the vectors.
It could be something like the following