SafeAILab / EAGLE

Official Implementation of EAGLE
https://arxiv.org/pdf/2406.16858
Apache License 2.0
622 stars 59 forks source link

Data processing script (ge_data/allocation.py) script does not work out of the box #42

Open avnermay opened 4 months ago

avnermay commented 4 months ago

There are a few small issues:

  1. The model is set to load from local file instead of from huggingface hub (https://github.com/SafeAILab/EAGLE/blob/main/ge_data/ge_data_all_vicuna.py#L22). To fix this, I just set bigname='lmsys/vicuna-13b-v1.3' at the line in ge_data_all_vicuna.py.

  2. The ShareGPT dataset loads from local disk, without instructions for how to download it. To fix this, I downloaded the ShareGPT dataset from https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered/blob/main/ShareGPT_V4.3_unfiltered_cleaned_split.json wget https://huggingface.co/datasets/Aeala/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V4.3_unfiltered_cleaned_split.json .

M-Chimiste commented 3 months ago

Thanks for asking about this and posting this information. I ran into the same problem and was trying to find the dataset as I try to create compatibility with a Gemma model.