EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.35k stars 1.68k forks source link

Compatibility with Models from PyReft Library #2012

Open crux82 opened 2 months ago

crux82 commented 2 months ago

Hi everyone,

First, I'm sorry if this issue has already been raised.

I wanted to ask if the framework supports models obtained through the PyReft library (https://github.com/stanfordnlp/pyreft). Currently, in lm-eval, there is support for models obtained by applying LoRA through PEFT, but I haven’t found any information regarding loading models obtained via LoReft.

Is there anyone who can help me with this?

Thank you for your time and help!

haileyschoelkopf commented 2 months ago

Hi there! Thanks for your interest.

PyREFT is a very cool project, but I think ultimately we can't support every external library / option without making either the maintenance overhead too high or making the code far less modifiable for a majority of users. I'm therefore disinclined to add this as a feature natively, though if many users request it or say a wide variety of ready-to-use REFTs are available on the HF hub then perhaps we can reconsider.

I'd recommend modifying the __main__.py script (or your own script that calls lm_eval.evaluate() or lm_eval.simple_evaluate() to apply the Reft modules / interventions to a loaded HF model, and pass that initialized HF model to initialize HFLM(pretrained=my_loaded_reft_model) . Or to subclass and override the relevant logic in lm_eval.models.huggingface.HFLM if that's more convenient. It should not be a significant quantity of code change! See e.g. https://github.com/state-spaces/mamba/blob/main/evals/lm_harness_eval.py for a minimal example of how this might be done. Hope this is helpful!

Might leave this issue open for now so that others can express interest if it's an often-requested feature though.

crux82 commented 2 months ago

Hi @haileyschoelkopf,

Thank you very much for your prompt and detailed response. I completely understand that it's almost impossible to support every new model or library out there.

Regarding your suggestions, I found the example at https://github.com/state-spaces/mamba/blob/main/evals/lm_harness_eval.py quite helpful. However, I'm still missing some contextual information to confidently proceed with customizing the library for a specific model.

Would it be possible to provide a minimal guide or some additional support for writing a main script? For instance, inspired by the tutorial for loading REFT models available at the following link:

https://medium.com/@syed_hasan/finetuning-llama-3-using-reft-representation-fine-tuning-technique-00f4fe1f497c

In this tutorial, the model is essentially loaded with:

import torch, transformers, pyreft
device = "cuda"

model_name_or_path = "meta-llama/Meta-Llama-3-8B"
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_name_or_path, torch_dtype=torch.bfloat16, device_map=device)

reft_model = pyreft.ReftModel.load(
    "Syed-Hasan-8503/Llama-3-openhermes-reft", model, from_huggingface_hub=True
)

reft_model.set_device("cuda")

And used with:

instruction = "A rectangular garden has a length of 25 feet and a width of 15 feet. If you want to build a fence around the entire garden, how many feet of fencing will you need?"

# tokenize and prepare the input
prompt = prompt_no_input_template % instruction
prompt = tokenizer(prompt, return_tensors="pt").to(device)

base_unit_location = prompt["input_ids"].shape[-1] - 1  # last position
_, reft_response = reft_model.generate(
    prompt, unit_locations={"sources->base": (None, [[[base_unit_location]]])},
    intervene_on_prompt=True, max_new_tokens=512, do_sample=True, 
    eos_token_id=tokenizer.eos_token_id, early_stopping=True
)
print(tokenizer.decode(reft_response[0], skip_special_tokens=True))

I'd appreciate any guidance or resources you could provide to help with integrating REFT models into the lm-eval framework. This could also serve as a first script to be added to the examples section, benefiting other users with similar needs.

Thank you again for your time and assistance!

haileyschoelkopf commented 2 months ago

The Mamba example is pretty nice in that you can simply call cli_evaluate() and not hack any of the rest of the script.

I'd recommend in this instance subclassing HFLM, and overwriting the _create_model() method to include the logic that you put there for loading an REFT model! That'd be the simplest.

crux82 commented 2 months ago

Great! I also assume that I need to overwrite the _model_generate() method. Or not?

LSinev commented 2 months ago

For cases that can be solved with subclass of LM class, ability to load them in the way like included tasks might be a solution. But this functionality is still awaiting PRs: https://github.com/EleutherAI/lm-evaluation-harness/issues/1457

crux82 commented 2 months ago

Hi, @LSinev! I think the solution suggested by @haileyschoelkopf can be "easy".

I think that I need to reimplement the init and the _model_call() method.