-
### Question
I got loss to be 0 when training on Qwen2 backend,
{'loss': 0.0, 'learning_rate': 0.00015267175572519084, 'epoch': 0.0} …
-
Does this still a bug for tokenization? I want to use this for code. Thanks!
-
Some parameters are on the meta device because they were offloaded to the cpu and disk.
Traceback ( most recent call last):
File "C:\ Users\15729\ Downloads\Qwen2- Boundless- main\Qwen2- Boundless- …
-
### Preliminary Remark
The observations presented here are also relevant for the _polmineR repository._
### Some Background
The _Bundestag Protokolle_ often employ spacing to enhance readability …
-
Hello,
Thank you for your hard work on this project. The tool is incredibly useful, and I appreciate your dedication.
I'd like to propose having tokenization/lexers for pattern matching along si…
-
Port CLIP tokenizer which leverages byte-level BPE. This tokenizer enables scenarios like StableDiffusion
May be dependent on https://github.com/dotnet/machinelearning/issues/6992.
Reference:
h…
-
## User Story
As a speaker of a minority language in the Philippines that uses a `-` as a letter, I want to be able to customize the tokenization of tC so that many of the words in my language are …
-
Originally from @SNvMK in https://github.com/microsoft/vscode/issues/120734
So, in python 3.10, there is match/case syntax. Currently, it is just white words(for monokai). I'd like if you add high…
-
**Describe the bug**
Error when tokenizing training data:
```
QASRLTask
[train]: /scratch/bowman/IRT_Experiments/jiant-2/experiments/tasks/data/qasrl/train.jsonl.gz
[val]: /scratch/bowman/IRT…
-
Annotations of contractions (mainly *au*, *aux*, *du* and *des*) are not consistent among French treebanks.
Whereas *au* and *aux* are easy to manage as multiword tokens ([Tokenization and Word Seg…
bguil updated
4 years ago