JShollaj / awesome-llm-interpretability

A curated list of Large Language Model (LLM) Interpretability resources.
1.06k stars 86 forks source link

suggested adding PAIR work; and how to add new suggestions? #3

Closed iislucas closed 8 months ago

iislucas commented 8 months ago

Your contributing doc link in the readme is broken :) So I'm making a suggestion here instead of as a pull request; but you might be interested in the PAIR team's work (pair.withgoogle.com and https://github.com/pair-code). In particular, we do a bunch of work on interpretability including:

Interactive Explorable visualizations (https://pair.withgoogle.com/explorables/) explaining important and interesting ML phenomena; of particular relevant to LLMs are:

Code/Tools: (The Learning Interpretability Toolkit/Tool) https://pair-code.github.io/lit/ a popular tool, especially in google, for using interpretability tools with ML models (most often used for language models, but works with many kinds of models and data).

Some recent papers on interpretability of language models by PAIR:

And there's a lot more here: https://pair.withgoogle.com/research/

JShollaj commented 8 months ago

Apologies for that complete oversight. Thank you for the suggestions :). Feel free to add any of the mentioned based on the guidelines. I fixed the files and they should be available now.