Closed Sniper970119 closed 9 months ago
Hi, the demo code is for huggyllama/llama-7b
. The attention pattern of LLAMA2 will differ from LLAMA-7B and hence its head_config should be profiled specifically to be effective. Right now, the config/head_config
includes the config of steered heads for GPTJ and LLAMA-7B. The example code for LLAMA-7B has been also been reproduced (please see this). We will release the attention heads for LLAMA-2 in future.
same to llama-7b
name = "/alg_vepfs/public/models/llama-7b"
model = AutoModelForCausalLM.from_pretrained(name, trust_remote_code=True, device_map="auto",
torch_dtype=torch.float32)
tokenizer = AutoTokenizer.from_pretrained(name, trust_remote_code=True)
and get a result
maybe need a seed to reproduce demo experiment?
The greedy search is applied in the demo code and hence there should be no randomness. Can you comfirm the local model file are the exactly same as the open-sourced huggyllama/llama-7b
, i.e., the same forward pass and weights that have not been fine-tuned. Since we conduct the profiling for huggyllama/llama-7b
, the demo head_config
should be applied to the same model. For example, the quickstart code has been reproduced by others with huggyllama/llama-7b
downloaded from huggingface: https://github.com/QingruZhang/PASTA/issues/4.
I try with code
It same both output is not a json format