-
-
hello, haohe. I really appreciate your work! Thank you for your kindness of open sourcing.
In the learning of the training code, I can not find the training of GPT2. In the original paper, embedding…
CJ416 updated
2 months ago
-
i train the model using 2 nodes, and copy machine1's model files to the machine2's directory.
then i use
python deepspeed_to_megatron.py --input_folder $checkpoint --output_folder output --tar…
-
Thank you for this excellent implementation. I'd like to suggest an optimization that could significantly speed up inference and enable streaming output.
Currently, there are two GPT2 graphs:
1.…
-
Environment:
- System: Ubuntu 22.04.2 LTS
- CUDA Version: cuda_12.1.r12.1/compiler.32688072_0
- nvcc: 12.1
I encounter an error when I execute:
```bash
make train_gpt2cu
```
Warring and …
-
### Is your feature request related to a problem? / 你想要的功能和什么问题相关?
gpt2-chatbot
### Describe the solution you'd like. / 你想要的解决方案是什么?
gpt2-chatbot
### Describe alternatives you've considered. / 你考虑…
-
Hi! How's it going? Is there any documentation on using this model? If not, could I write some, and request you merge it to this repo? Thanks!
-
Need to look into the docs for huggingface's pytorch-transformer library to see how to train it
then what to train it on? Gutenberg-dammit? that seems probably pretty good.
Or maybe just a subse…
-
Hi,
I'm trying to apply context-debias to generative models (gpt2). I tried to directly use your script but the loss is extremely large (1e10+). I notice you have an argument "mlm" in run_debias_mlm…
-
Hi, i have just used the default params to p-tune the gpt2-medium on LAMA task and the results is as follows.
best dev_hit@1: 51.8 best test_hit@1: 44.5
For the results I got, I have some confusions…