-
インフォーマルな文を生成するように強化学習を通じて誘導する際に利用するコーパスもインフォーマル寄りな文体だと学習が早そう。そのようなデータセットを探す
- [x] インフォーマルな文体を比較的多く収録したコーパスの調査
- [x] (もしあれば)それらのデータセットでファインチューニングされたGPT系モデル
-
I trained a model with Peft and I want to convert it as torchscript. Is there a way to this? I tried the normal translation ways for torchscript but got errors.
-
### Describe the bug
On Spaces the previous chat persist and sometime if you add too many chat logs it just breaks. I don't want my demo to break.
### Reproduction
Just chat using: https://huggingf…
-
### System Info
- `transformers` version: 4.25.1
- Platform: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-centos-7.9.2009-Core
- Python version: 3.7.16
- Huggingface_hub version: 0.13.3
- PyTorc…
-
Reddit could provide a good source for training data, especially since the tree-like structure allows for multiple continuations of a conversation, which is amenable to ranking. Probably, not every su…
-
The gap we have is about the `end_token_id`. In a chatbot system like DialoGPT , the user prompt needs to be concatenated by an `end_token` before generation, otherwise it will just generate `end_toke…
-
# 🚀 Feature request
Currently `GenerationMixin.generate()` only accepts `input_ids` but not `inputs_embeds`. Therefore this method is not usable when custom input embeddings are required. In contra…
ymfa updated
11 months ago
-
### Feature request
When using a fast tokenizer in the `DataCollatorForSeq2Seq`, currently the following warning is printed
```
You're using a T5TokenizerFast tokenizer. Please note that with …
-
I don't know what exactly is the issue but I cannot get DialoGPT to work with my pre-trained model.
The model was trained with a `csv` file that basically has two columns:
from
text
Dari…
-
priority: medium, shouldnt be too hard since we can use their code