Noisy code extraction - Githubissues

1wheel commented 3 months ago

disconnectedchildren isn't in the model generated output:

https://nicholas.carlini.com/writing/2024/evaluation_examples/make_tree_from_text.py.TestMakeTreeEasy_claude-3-5-sonnet-20240620.html#tab1

Not sure the best way of fixing generally — could rerun extraction when code tasks fail or something slightly fancier like making sure all the whitespace trimmed lines of generated code are present in the extracted code and vice versa.

carlini commented 3 months ago

Yeah. Claude 3.5 seems to fail a similar way here

https://nicholas.carlini.com/writing/2024/evaluation_examples/make_sqlite_table.py.TestSqlMakeTable_claude-3-5-sonnet-20240620.html#tab1

I'm actually more or less okay with this "failure" mode. The prompting is explicit enough on what the model is supposed to do, and if it doesn't, then the model is wrong.

I definitely agree though that it definitely under-reports utility. Maybe there could be a "self-correction" mode that tried to let it fix dumb mistakes it made.

1wheel commented 3 months ago

Ah, I missed that models extract their own code. Seems fair then.

carlini / yet-another-applied-llm-benchmark

Noisy code extraction #18