-
Traceback (most recent call last):
File "D:\sentencepiece_chinese_bpe-main\chinese_bpe.py", line 23, in
tokenizer = ChineseTokenizer(vocab_file=output_dir + 'chinese.model')
File "D:\sente…
-
Hi
can you please review my solution for this bug:
https://sourceforge.net/p/snuggletex/bugs/9/
It's done in this commit in `LaTeXTokenizer`:
https://github.com/axkr/symja_android_library/com…
-
At least the following appear in the data:
* `Gam esIndustry-julkaisu`
* `ki rjoitettu`
* `myytyynYounitediin`
* `tal lennustilaa`
* `jaNokia`
* `televisionkatselu un`
* `Lumia-puheli mia`
*…
-
Spacy models should be modified according medical corpus. For example:
`tokens['train'][0:10]: [['EMEA', '/', 'H', '/', 'C', '/', '551', 'PRIALT']...`
-
-
-
-
Traceback (most recent call last):
File "/data/server03/Zhaozikai/Clip/train.py", line 170, in
model = CLIP(**config)
File "/data/server03/Zhaozikai/Clip/nets/clip.py", line 57, in __ini…
-
The current tokenisation story of VS Code is based on TM grammars, which are pretty powerful, but we are running into their limits if we want to do something more than a top-down scanner can do. Also,…
-
**Github username:** @0xSwahili
**Twitter username:** --
**Submission hash (on-chain):** 0x468bd6737182dd0a8ad12826c0130b1894c9a1c18627568f8e26abd58f1e68c2
**Severity:** high
**Description:**
**Desc…