allenai / open-instruct

Apache License 2.0
1.22k stars 166 forks source link

Beaker improvements & HF formatting #110

Closed hamishivi closed 7 months ago

hamishivi commented 8 months ago

This PR turns the beaker script into a command-line configurable thing, making it easier for people to use it without actually reading the script and understanding what's happening. @dwadden may turn this into something more tightly integrated with beaker-py.

Additionally, I've added support for just applying the HF tokenizer in the eval chat formatting. This required a little refactoring to pass the tokenizer into the chat template. This allows us just to reuse the chat template given in an HF tokenizer (saving us time in implementing and testing any given chat format, hopefully).

yizhongw commented 8 months ago

This looks nice!

I guess the scripts in scripts/eval should still work since the refactored eval code can still accept a specified template function? Did you test them?

hamishivi commented 8 months ago

Yup, I've given this some quick tests.

natolambert commented 8 months ago

I'd say adding tests with all this chat template stuff would be great, maybe I can do this next week to get started here.

hamishivi commented 8 months ago

Tests would be good, maybe in a future PR :)