-
Data is tokenized 2 times :
1. With Stanford CoreNLP : https://github.com/nlpyang/PreSumm/blob/ba17e95de8cde9d5ddaeeba01df7cace584511b2/src/prepro/data_builder.py#L110
2. With HuggingFace's Bert…
-
It looks like the tokenizer patching breaks. Here's the log:
```
ValueError Traceback (most recent call last)
Cell In[1], line 20
7 # 4bit pre quantized models…
rwl4 updated
4 months ago
-
I used [colab notebook](https://colab.research.google.com/drive/1i7I5pzBcU3WDFarDnzweIj4-sVVoIUFJ)to fine-tuned this model.When I run trainer.train(),It goes into error.
```
in :2 …
-
```
I'm trying to find a good Semantic Role Labeling tool that I can use in my java
code using Netbeans.
I tried ClearNLP and it work with testing the version with the right output fom
this link: ht…
-
What would be the best approach for adding new vocab to the tokenizer before training the model? I tried accessing the tokenizer directly but realized there would be no way to resize the token_embeddi…
xsfa updated
10 months ago
-
Hey @sanchit-gandhi, like the repo. Excited to see this being worked on. Here's a benchmark of WhisperSpeech. I used your sample script on the same exact text snippet and it finished processing in …
-
## Ввод
На вход принимает csv фаил с двумя колонками
1. Нормализованные тексты
2. Флаги valid\unvalid
## Вывод:
Конфигурация `config.json`
Модель в формате `pytorch_model.bin`
Карта…
Vldln updated
5 months ago
-
When settting `os.environ["WANDB_LOG_MODEL"] = "end"` prior to the training loop and specify `report_to='wandb'` in `TrainingArguments`, I receive the following error:
```
Loading best SentenceTra…
-
Hello,
I installed spacy-udpipe from the Pypi repo using the following
`pip install spacy-udpipe`
When I follow the tutorial code from the Pypi package tutorial
```
import spacy_udpipe
sp…
-
Sentenpiece tokenizers have the property that [`Decode(Encode(Normalize(input))) == Normalize(input).`](https://github.com/google/sentencepiece/blob/master/doc/api.md#detokenize-text-postprocessing). …