-
### For Bug Reports
* BookStack Version: v0.20.0
When the word I'm looking for is the first word, or there's a space in front of it, it's ok.
![i01](https://user-images.githubusercontent.com/30…
-
![image](https://github.com/MaartenGr/BERTopic/assets/41793074/d0e64ec0-8d86-4f7c-acb6-b4e2e057c473)
![image](https://github.com/MaartenGr/BERTopic/assets/41793074/1440a9f9-9cfc-448a-82a0-871ae680530…
-
It would be great to add the org.apache.lucene.analysis for smarter tokenization for all languages. In this way, processing other languages such as Chinese is more sensible with your library.
-
```
from utils.bert import BertTokenizer, load_vocab
vocab_bert, ids_to_tokens_bert = load_vocab(vocab_file="./vocab/bert-base-uncased-vocab.txt")
TEXT.build_vocab(train_ds, min_freq=1)
TEXT.v…
-
### What happened?
I am attempting to measure the perplexity of the gemma-2-9b-it-Q4_K_M.gguf model using llama.cpp. However, I encounter an issue where the process gets stuck at the "tokenizing th…
-
## Describe the feature
**Elasticsearch version** (`bin/elasticsearch --version`):
6.7.2
**Plugins installed**: []
- nori
**JVM version** (`java -version`):
jvm 1.8
**OS versi…
-
I've inspired by this mail-list thread.
As many Japanese already know, default built-in dictionary bundled with Kuromoji (MeCab IPADIC) is a bit old and no longer maintained for many years. While i…
-
- [ ] [blog/starcoder2.md at main · huggingface/blog](https://github.com/huggingface/blog/blob/main/starcoder2.md?plain=1)
# blog/starcoder2.md at main · huggingface/blog
---
## StarCoder…
-
They offer free models for non-prod usage. this is a 104B, way better than other free models
```bash
curl --request POST \
--url https://api.cohere.ai/v1/chat \
--header 'accept: application…
-
Hi @andimarafioti
I just cloned the repo (0c53fda7dd5682155186a03386434a1c1cf50212) and did a `pip install`.
Then I ran
`python s2s_pipeline.py --local_mac_optimal_settings`
and got this er…