Closed sammcj closed 4 months ago
What formatter did you use? Most changes are format, so hard to track. I'm also using tabs for indentation instead of spaces for accessibility.
By the way, just continuing the conversation from Reddit... Re aborting and resuming, I'll test some more, but I think it's Ollama problem, not the script. Again, thanks for the PR!
Oh sorry I missed the tabs, I’ll update the PR when I’m back home later.
FYI Python black w/ VSCode for formatting, it respects .editor config but when there isn’t one I must have it set to format on save.
I've implemented locking which should help, I think it's resuming properly now but might benefit from some more testing in anger 😄
I've updated .gitignore to add the eval_results directory and also added a .editorconfig to ensure no one else gets caught out by the use of tabs and does you a dodgy PR like I did 🤣
Merged! Thanks!
This probably doesn't have to do with parallel, but I'm just asking if you might have some idea. Last night, I ran a benchmark with --parallel 2, and all the responses in eval_results had:
"@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
The score result was:
{"computer science": {"corr": 43.0, "wrong": 367.0, "acc": 0.1048780487804878}, "total": {"corr": 43.0, "wrong": 367.0, "acc": 0.1048780487804878}}
If the model responded with all @@@, the score would have been 0, but I couldn't find any response that wasn't @@@.
>>> res = json.load(open("computer science_result.json"))
>>> len(res)
410
>>> reg = r"^@+$"
>>> search = [r['response'] for r in res if re.match(reg, r['response'])]
>>> len(search)
410
>>> search = [r['response'] for r in res if not re.match(reg, r['response'])]
>>> len(search)
0
I'm attaching the both response and score files.
I couldn't find anything that might cause this when saving a file either. I tried a few times after, but I couldn't reproduce the result. lol I'm just wondering if you have any idea? Something has to do with Ollama?
Add the ability to run tests in parallel
Great tool!