Open sohaibimran7 opened 4 days ago
This is definitely something we are interested in supporting more deeply! We are soon going to make it possible to run a set of analysis code on top of an eval-set and then display that in the viewer. At the same time, we will hopefully discover some useful common idioms and tools that we can provide. Would love to hear from people on this thread about what the general shape of requirements are!
I personally would value the following in a visualisation framework:
Many evaluation tools have frameworks to allow summarising and visualising results. An example is zeno for lm-eval-harness. I understand that results-summarisation & visualisation needs can be quite diverse and one tool may not work for anyone. Still, I think if inspect ai logs can be easily summarised and visualised, researchers could iterate faster. I wrote a very quick and dirty class for visualising a list of EvalLogInfos for my own experiments and was wondering what other people use and whether there is interest in results summarisation visualisation support for inspect.