-
Unigram tokenizers are used in the best multilingual models e.g. multilingual-e5, paraphrase-multilingual-MiniLM-L12-v2.
It would be very helpful if you could add an implementation to expand the func…
-
Clustering is getting confused when given a vector of all zeros (i.e. in the case that a document has no POS trigrams of VB,NN,VB). Not sure what the most logical fix is for this
-
I'm using the scorer generator provided `generate_scorer_package`. I'm also using (e.g., SentencePiece) to build a unigram language model, where the decoder predicts the size of the language model. Ho…
-
how can u explain the meaning of the following template:
``` bash
U15:%x[-2,1]/%x[-1,1]
```
-
`create_unigram_book_counts` and `create_bigram_book_counts` are redundant. Refactoring may make sense so that the updates made to one don't need to be copy-pasted. Ultimately, the functions are the s…
-
@martinreynaert provided the following examples:
```
veroor_zaakt#1#veroorzaakt#100000002#1#0.815385
veroor_zaakt_door#1#veroorzaakt_door#100000001#1#1
veroor#1#verloor#100000024#1#0.998869
``…
-
### Describe your feature request
在使用unigram之类的软件时打开某个图片或者从一个图片进入到另一个聊天,按下esc按键就可以返回到上一级,希望coolapklite也能有类似的功能,桌面端Windows总去点那个uwp软件自带的返回操作起来有点麻烦
### How important is this to you?
Nice-to-have
### …
-
Collect Unigram data from Project Madurai, Wikipedia
-
**Description:**
When using the `SentencePieceUnigramTokenizer` with a custom vocabulary, there is no attribute to handle the `unk_id`, causing errors when encoding text not present in the vocabula…
-
### Is your feature request related to a problem?
atm the window is very small when other windows have full height or on unigram it's full application height
### Describe the solution you'd like
co…