Open mirorac opened 17 hours ago
When ngram_vocab=True is used, single words seem to be ignored in the vocabulary. In previous versions, this behavior did not occur, so I wanted to check if this change was intentional or an unintended regression.
ngram_vocab=True
Here’s the relevant line in the code: https://github.com/ddangelov/Top2Vec/blob/2435731bc834f49aa22b38d46102bc37b960dffc/top2vec/top2vec.py#L890
Suggested fix:
vocab += phrases
Could merging the phrases with the previously built vocabulary resolve the issue, or is this the expected behavior in the latest version?
It was intentional, as single words would often end up as top topic words rather than the ngrams.
When
ngram_vocab=True
is used, single words seem to be ignored in the vocabulary. In previous versions, this behavior did not occur, so I wanted to check if this change was intentional or an unintended regression.Here’s the relevant line in the code:
https://github.com/ddangelov/Top2Vec/blob/2435731bc834f49aa22b38d46102bc37b960dffc/top2vec/top2vec.py#L890
Suggested fix:
Could merging the phrases with the previously built vocabulary resolve the issue, or is this the expected behavior in the latest version?