dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.69k stars 495 forks source link

Fix IndexError caused by invalid token IDs in CFGGuide #1251

Open RohitRathore1 opened 2 weeks ago

RohitRathore1 commented 2 weeks ago

It fixes issue #1232

These changes fix the IndexError caused by invalid token IDs in allowed_tokens_concat by handling eos_token_id appropriately and adjusting token handling in CFGGuide. The updates maintain backward compatibility and ensure that existing functionality continues to work as expected.

Tested on CPU:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:08<00:00,  1.76s/it]
Saturn
\{[ ]?"caption"[ ]?:[ ]?"([^"\\\x00-\x1F\x7F-\x9F]|\\["\\])*"[ ]?\}
{"caption":"Command module pilot Buzz Aldrin walks across the lunar surface behind the deployed Lunar folloteneer's Ramp. The bottom of a Life Science Branch leg lock is framed in a footprint on the lunar surface behind the left leg of Aldrin's suit. The videocamera on the fullmomteiner's chest is visible atop the open hatch. Apollo 11, Aug. #42; CC AS11-40-5924,"}