-
I'm comparing the tokenization between original Meta repo and llama.cpp with LLaMA (also had same issue with LLaMA v2).
For example, tokenizing the prompt "Hello world" and " Hello world" gives the…
-
**Notice by @rricard:** This issue is now only open to discuss the following alternatives: `#{ }`/`#[ ]`, `@{ }`/`@[ ]`, ~~`{| |}`/`[| |]`~~ or `{| }`/`[| ]`
---
_Original issue text:_
Most p…
-
## 🐛 Bug
No module named 'kobert'
No module named 'glounnlp'
colab The code is not running after the update.
코랩 업데이트 이후 코드 실행이 안되고 있습니다. 이전에는 문제 없이 정상작동을 했던 코드입니다.
## To Reproduce
버그를 …
-
When doing sequence tagging for Named Entities, Flair is injecting spaces around punctuation inside the Span itself (which I suspect this is due to the tokenization being applied). I previously report…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Contact Details
twitter @joshdance
### What should this feature add?
Reading the docs here - https://invoke-…
-
Is this the expected behavior ?
```python
import pyonmttok
th_string = "คุณอาจจะทำอย ่ างนั ้ นไปซักพัก จนคุณเริ ่ มจะรู ้ สึกถึงมันจริงๆ"
tokenizer = pyonmttok.Tokenizer("aggressive", joiner_anno…
-
Testing microsoft/vscode-jupyter#11463, I (re)learned today that we truncate long lines.
My setup:
1) Open insiders.vscode.dev
1) Install Jupyter and Python extensions
1) Create a new untitled j…
-
Hi guys. I want to add some new special tokens like `[XXX]` to a pretrained `ByteLevelBPETokenizer`, but I can't find how to do this in python.
-
I am creating a repo with some draft NLP methods that will assist me with the further work of the text processing later on down the road.
The ones I will cover for sure are:
* Tokenization
* Basic Pre…
-
I have issues to get started with setting up a demo for the Seq2SeqClassificationConsole app.
I assume as a NewBe I do not setup the training data correctly. Can you please point me the way to setup …