allenai / open-instruct

Apache License 2.0
1.1k stars 145 forks source link

flan-v2 data #106

Closed constanzafierro closed 4 months ago

constanzafierro commented 5 months ago

I can't access the data in https://beaker.org/api/v3/datasets/ could you clarify what data specifically you used for the flan-v2 model you trained? I would like to have a look at the examples.

Thanks!

hamishivi commented 5 months ago

Hi - you can download our mixture from huggingface and look at examples where the dataset id is flan_v2, e.g.: https://huggingface.co/datasets/allenai/tulu-v2-sft-mixture/viewer/default/train?f[dataset][value]=%27flan_v2%27 (takes a while to load)

I'll look into why the beaker datasets setup isn't working....

hamishivi commented 4 months ago

Sorry for the wait! We just merged a PR to download from huggingface instead. You can check the files here.