TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models
https://transformerlensorg.github.io/TransformerLens/
MIT License
1.17k stars 241 forks source link

Model baichuan #649

Open bryce13950 opened 6 days ago

bryce13950 commented 6 days ago

Description

Right now this is a draft in order to discuss and experiment with. We still need to add Baichuan 1, and there is currently an issue with compatibility for Baichuan itself for at least one of the models we want to support. A lot of the configuration has been complete. We need to add a config variable for the use_fast variable for the tokenizer. Right now it is set to false for a single architecture, but it varies from model to model on Baichuan.

Beyond that, there are issues that seem related to #569 and #570 with this model. This model supports both Mandarin and English. At the moment generations are mixing the two languages together, which seems to be an issue with implementation accuracy.

Fixes #622

Type of change

Please delete options that are not relevant.

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist: