Inference script for the level-3 data

Hello,

I am trying to run evaluation of the Lynx model on level-3 data. However, I did not find the inference script and unsure of how to reproduce it.

My question is: Did the model generate all steps of tool calling during the lvl-3 evaluation and received feedback from tool after each step? How the errors were handled? How the errors were parsed and added into the tool output? How the errors were incorporated in the tool output if the model hallucinated and generated something that couldn't be parsed? Is it possible to provide the full script of running the Lynx model on lvl-1, lvl-2 and lvl-3 data?

Thanks in advance Gregory

AlibabaResearch / DAMO-ConvAI

Inference script for the level-3 data #102