-
Has the repo been implemented solely to find the optimal size, such as in the snippet vocab = vocab[:optimal_size].
-
Hey,
I wanted to make the h3m model work, I followed all the steps to get the dataset and run the 3 steps to then build dataset from this repository. But I ended with two problems:
1. All the conf…
-
Some time ago I recall we agreed that we needed a landing page for Vocabs/RMG and during my presentation last week I missed it again. Just to start the conversation.
-
Outils
- OpenTheso https://opentheso.huma-num.fr/opentheso/
- Semantic MediaWiki https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki + https://www.mediawiki.org/wiki/Extension:Semantic_Glossar…
-
Currently there are many hard coded areas that assume a single vocab. If we want to experiment with split vocabs we'll need to add the functionality. I'll mark this as quality, as it may help train ce…
-
-
Hi,
Where in code is the best to enforce using an array of OWL/vocabs or stick to a SHACL?
Thanks very much!
-
TaskType in utils.py has no attribute by the name OPEN_VOCAB_DETECTION.
@D-Ogi perhaps you forgot to commit any change?
```
Traceback (most recent call last):
File XYZ/src/WatermarkRemover-AI/…
-
**Describe the Issue**
After updating from 1.76 to 1.77 I started to get a bunch of weird messages on model load.
Example: `llm_load_vocab: control token: 109 '' is not marked as EOG` full listin…
-
Hi thanks for the library! I am using e.g. llama 3.1's tokenizer, but its 128k vocab size is too large for my field. Thus, to make training faster, I would like to reduce the tokenizer vocab size by r…