How to use the model checkpoint?

getao / icae

The repo for In-context Autoencoder

Creative Commons Zero v1.0 Universal

82 stars 4 forks source link

How to use the model checkpoint? #1

Closed trestad closed 3 weeks ago

trestad commented 11 months ago

Thanks for your opening source!

I am wondering what do model.z01, .z02, and .zip contains? It seems these files are not model parameters because of their small size. Are they LoRA adapters? If they are LoRA adapters, why is it related to replacing llama-zero-weight to the original weight? Could you share a inference script? Thank you very much!

getao commented 11 months ago

You can load it using torch.load('path'). The llama's weights are replaced with 0 and you can restore it with your llama checkpoint.

This checkpoint storage is for the compatibility with our model class implementation that you can check in the repo.

The inference script will be uploaded soon.

getao commented 11 months ago

The inference script will be uploaded soon.

Uploaded.

MelodyVAR commented 11 months ago

Thank you for sharing. But I wonder how to restore it with llama checkpoint. Are the files in the model folder now lora weights?

YoojuShin commented 11 months ago

@MelodyVAR Yes, but it also contains zero weights for llama. You should merge lora weights in the model folder with llama2-7b-chat model weights and generate a new state_dict.

model = LlamaICAE(model_args, training_args, lora_config).to("cuda") # restored llama2-7b-chat model 
llama_state_dict = model.state_dict() 
state_dict = torch.load("/icae/model/llama-2-7b-chat-finetuned-icae_zeroweight_llama2.pt") # change the path for your model
new_state_dict = OrderedDict()
for layer_name, weight in state_dict.items():
    if isinstance(weight,torch.Tensor) or weight != 0.0:
        new_state_dict[layer_name]=weight
    else:
        new_state_dict[layer_name]=llama_state_dict[layer_name]
model.load_state_dict(new_state_dict)

WJMacro commented 8 months ago

Hi, I'm also trying to use the pre-trained checkpoint. I cloned the repo but failed to unzip the model file. Linux thinks the model.zip file is a zip bomb. Is there any instruction on how to unzip the files correctly and run the pre-trained model? Thank you very much.

getao commented 3 weeks ago

The repo is updated. You can easily access the v2 model.