gmkim-ai / PromptKD

An official implementation of "PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning" (EMNLP 2024 Findings) in PyTorch.
https://promptkd.github.io
MIT License
5 stars 1 forks source link

The "processed_data.tar" data link is invalid. #1

Closed shhn1 closed 1 month ago

shhn1 commented 1 month ago

Thanks for your great work!

I am very interested in PromptKD you proposed and tried to reproduce it. But I found that the processed_data.tar link is invalid and I can't download it. I would be grateful if you could re-upload your processed_data.tar , which would be very helpful to me.

In addition, if I want to replace it with my own training data, how should I organize my data format to meet the subsequent training requirements?

Looking forward to your reply :)

gmkim-ai commented 1 month ago

Thank you for your interest in our research.

Firstly, since we share the same experimental settings as the MiniLLM paper, we also obtained the download link for the processed_data.tar file from the corresponding GitHub repository. Upon checking, it seems that the link has changed, and you can now download it using the following command:

wget -O processed_data.tar https://unilm.blob.core.windows.net/minillm/MiniLLM/processed_data.tar

Thank you for bringing this to our attention, and we will make sure to update our code accordingly.

Regarding the data format, please refer to the files in the processed_data.tar file. Each line consists of keys: instruction, input, output, and prompt. The prompt is structured as follows: “{instruction}, {input (if applicable)}, Response:”.

If you have any further questions, feel free to ask anytime.

shhn1 commented 1 month ago

Thanks for your kind reply! It helps me a lot. :)