-
Right now our LiteralNER is _very_ literal, so in some cases is not working.
Example: an entry like this
```
takayasu's arteritis
```
Is never found because the documents will be tokenized, transf…
-
I ran the command like this:
```bash
bun x humanifyjs local responsez.js
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce GTX 1070 (NVIDIA) | uma: 0 | fp16: 0 | warp size: 32
[nod…
-
Hello Till,
I have reached out to my IT department; however, they are not able to resolve the below issue. Do you have any suggestions on how to resolve this issue?
Running pythonv 3.9.5, PIP…
-
你好我看bert tokenizer只对text进行了tokenize,如果碰到tokenizer把例如1994分成了19和##94, 但是gaz是针对每个character 1/9/9/4识别的BMES word,不会发生输入mismatch的问题么?
-
We currently have the field `arxiv_class` which contains the classification of a paper provided by arXiv, which typically is in the form of `category.SC` (where SC represents an abbreviation for the s…
-
Dear Team,
The code below doesn't work and the context doesn't sententce token.
if args.tokenizer == "PTB":
import nltk
sent_tokenize = nltk.sent_tokenize
def word_tokeniz…
-
Hello,
Please how can I use this library to save a card and then tokenize?
It is actually crucial to my development.
Can you help?
ghost updated
6 years ago
-
what if you could issue tokens where each token is worth 1 hour of your time.
-
Related to: https://github.com/huggingface/transformers/issues/25073
In my current project, I'd like to add a special token that doesn't insert a space to the next token.
Currently, I need to spec…
-
请问你是怎么解决“def build_vocabulary(spacy_de, spacy_en):
def tokenize_de(text):
return tokenize(text, spacy_de)
def tokenize_en(text):
return tokenize(text, spacy_en)
pr…