UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://inspect.ai-safety-institute.org.uk/
MIT License
565 stars 96 forks source link

Is there a way to send eval logs to an API endpoint? #39

Closed karthikscale3 closed 3 months ago

karthikscale3 commented 3 months ago

Hi, I am the core maintainer of https://github.com/Scale3-Labs/langtrace. For context, langtrace is a fully open source and open telemetry based observability platform for LLM powered applications. We are interested in building native support for inspect_ai in a couple of ways:

  1. Users can download annotated dataset from langtrace using langtrace's SDK and use it for running evals with inspect_ai.
  2. Additionally, users can send the eval logs to langtrace over an API and visualize the report within langtrace.

For 2., I wanted to know if there is a way to send these logs to an API. I noticed that there is support for S3. If this is not in the roadmap, I am happy to contribute and submit a PR. Please let me know. Thanks

jjallaire commented 3 months ago

Currently, we can write logs to any fsspec compatible filesystem: https://filesystem-spec.readthedocs.io/en/latest/. fsspec has these backends built in (https://filesystem-spec.readthedocs.io/en/latest/api.html#implementations) and these third implementations (https://filesystem-spec.readthedocs.io/en/latest/api.html#external-implementations). So one path for (2) is to implement an fsspec back-end.

Internally, we have a recorder interface however it's not ready to be externalized (there is too much implicit dependency on it being a traditional filesystem). We would eventually like to make this more generic, but for the time being I'd suggest that the fsspec approach would create much more seamless integration. Note that you can use a setuptools entrypont to make your filesystem available (https://filesystem-spec.readthedocs.io/en/latest/developer.html) so if your package was installed then users could just do e.g. langtrace:// to access your backend.

Note that this would also work for datasets (so users could also load datasets using langtrace://, etc.)

karthikscale3 commented 3 months ago

Thanks @jjallaire for the detailed answer. I was unaware of the fsspec compatibility but this is neat and makes a lot of sense. We will go ahead with the fsspec implementation for our backend and keep you all posted on how it goes. This answers my original question and I will close this issue now. Thanks again

aisi-inspect commented 3 months ago

Great, glad that works well for you. I am working on adding some more explicit documentation about this (including specifically which fsspec methods we call, as it's definitely a small-ish subset that we rely on).

jjallaire commented 3 months ago

Here are additional docs on implementing a custom storage provider for Inspect via fsspec: https://ukgovernmentbeis.github.io/inspect_ai/extensions.html#storage

karthikscale3 commented 3 months ago

This is very helpful. Thank you again!