JohnSnowLabs / langtest

Deliver safe & effective language models
http://langtest.org/
Apache License 2.0
498 stars 39 forks source link

Preparing LLM Benchmark Table ( LangTest) #946

Open ArshaanNazir opened 10 months ago

JustHeroo commented 10 months ago

@ArshaanNazir please add yourself as an assignee to the task

JustHeroo commented 10 months ago

@ArshaanNazir any updates?

ArshaanNazir commented 10 months ago

We are working on it. Here is the link to the tracking sheet: https://johnsnowlabs-my.sharepoint.com/:x:/p/rakshit/ETX1Z44PipFOqm8Ue8Av3_UBycHH_9oK-oJJUpQfc_n54w?e=exe0Ja

muhammetsnts commented 9 months ago

@ArshaanNazir did we publish any benchmark (LLM and embeddings) on the LangTest web site?

ArshaanNazir commented 9 months ago

We have created the streamlit apps for both of the benchmark tables. We are finalising their design and will be update on website by end of this week.

Cabir40 commented 8 months ago

@ArshaanNazir @vkocaman We have created a new folder for the langtest demos

https://github.com/JohnSnowLabs/streamlit-demo-apps/tree/master/langtest

do you need anything else?

ArshaanNazir commented 8 months ago

I am not sure if we are going ahead with the streamlit apps now. @dcecchini can you confirm ?

dcecchini commented 8 months ago

Hi @Cabir40 @ArshaanNazir @muhammetsnts @JustHeroo, we started creating the streamlit apps for the leaderboards but @vkocaman suggested to ask the design team to build them using web tools that look better.

They are preparing them; you can check a draft at in this link.

In the meantime, we are reviewing the information to be contained on the pages, as we need to make sure that the leaderboards show all the relevant information (adding more filters, improving the visualization, creating more data with benchmark results, etc.).

rajshah4 commented 8 months ago

I understand why you would want a more attractive web app. I was hoping for a streamlit app -- simply because I am looking for an LLM leaderboard in a box that I could deploy to enterprise clients.