sentence-tokenizer Search Results

1000+ results
for sentence-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/text #835

Raise AttributeError when attribute is unknown for torchtext…

## 🐛 Bug **Describe the bug** In short, an empty generator is created when calling `__getattr__` with an unknown attribute on `torchtext.data.dataset`. [Here is code](https://github.com/pytorch/text…

ToddMorrill updated 4 years ago
5
zhaoyingjun/chatbot #59

TypeError: cannot use a bytes pattern on a string-like objec…

/usr/bin/python3.5 /home/scrooge/chatbot/seqGanChatbot/execute.py /usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/dtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a sy…

AlucardNosferatu updated 4 years ago
3
qdrant/fastembed #70

clip-ViT-B-32-multilingual-v1 support, ps: I can contribute.

I exported clip-ViT-B-32-multilingual-v1 to onnx with some modifications(no effect on the output embedding). hf optimum onnx export can export this model with (0) Transformer and (1) Pooling. But …

yaman updated 4 months ago
6
apache/lucene #3572

add sentence boundary charfilter [LUCENE-2498]

From the discussion of #3243: It would be nice to have a CharFilter? to mark sentence boundaries. Such functionality would be useful for: - prevent phrase queries with 0 slop from matching across sen…

asfimport updated 2 years ago
5
UKPLab/EasyNMT #48

Exception: 404 Client Error: Not Found for url: https://hugg…

Hi, I'm using EasyNMT for translating customer reviews. During translation, I got this error HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/Helsinki-NLP/opus-mt-…

yen-tran-yum updated 3 years ago
1
apple/coremltools #1809

Model converts fine to neuralnetwork but produces all nan va…

## 🐞Describing the bug The output of converted Coreml model and original Pytorch model is different. Obvious mismatch is observed. I also notice that there are some similar issues that have been prop…

yqrickw20 updated 8 months ago
8
nltk/nltk #2008

Better PunktTrainer

While trying to retrain a sentence tokenizer model with `PunktTokenizer`, the NLTK code took up >200GB of RAM and a lot of swap and doesn't seem to end after 2 days of training. ```python import …

alvations updated 5 years ago
7
run-llama/llama_index #13952

[Question]: How to count token in Anthropic models?

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question Hi, I want to know how to count tokens (Embedding Tokens, LLM Prompt Tokens, LLM Complet…

Ninlawat-Puhu updated 1 month ago
4
shenshen-hungry/Ancient-Chinese-Segmentation #1

Open WebAPI needed

Hi, my collegues and I have released [UD-Kanbun](https://github.com/KoichiYasuoka/UD-Kanbun), a python-based tokenizer, POS-tagger, and dependency-parser for classical Chinese texts. And now we are in…

KoichiYasuoka updated 4 years ago
7
UKPLab/sentence-transformers #2922

Please future prove `clean_up_tokenization_spaces`

This is the future warning we are currently reciving: transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. T…

PhorstenkampFuzzy updated 1 week ago
28

上一页 1...27 28 29 30 31 32 33...100 下一页

1000+ results for sentence-tokenizer

1000+ results
for sentence-tokenizer