Questions about reproducing

xmttttt commented 1 year ago

Thank you for your contributions. However, I encountered some difficulties in reproducing the results of your paper, and I wonder if you can provide some help.

Firstly, I am confused by the weights you provided, are weights in https://github.com/SAI990323/TALLRec/tree/main/alpaca-lora-7B and https://drive.google.com/file/d/1teUwLm4BOqhngfCKKXE1tiMhJPf_FvRJ/view the same lora weights which are obtained through instruct tuning by alpaca-lora(using the self-instruct dataset provided by alpaca)?

Under this assumption, I performed Rec tuning using the provided lora weight. However, no matter the sample number I choose(16/32/64/256, which is mentioned in paper's setting), the auc score has always been around 50%(about 0.47).

The version of peft is 0.3.0, version of transformers is 4.28.0. The parameter in finetune_rec.py is:

base_model: str = "../llama-2-7b-hf/",  
train_data_path: str = "./data/movie/train.json",
val_data_path: str = "./data/movie/valid.json",
output_dir: str = "./movie-64",
resume_from_checkpoint: str = "./alpaca-lora-7B",  
sample: int = 64, 
seed: int = 2023, 
# training hyperparams
batch_size: int = 128,
micro_batch_size: int = 32, 
num_epochs: int = 200, 
learning_rate: float = 1e-4, 
cutoff_len: int = 512, 
# lora hyperparams
lora_r: int = 8,
lora_alpha: int = 16,
lora_dropout: float = 0.05,
lora_target_modules: List[str] = [
  "q_proj",
  "v_proj",
],
# llm hyperparams
train_on_inputs: bool = True, 
group_by_length: bool = True, 
# wandb params
wandb_project: str = "",
wandb_run_name: str = "",
wandb_watch: str = "",  
wandb_log_model: str = "",

After training, I run evaluate.py with these parameters:

load_8bit: bool = False,
base_model: str = "../llama-2-7b-hf/",
lora_weights: str = "./movie-64",
test_data_path: str = "./data/movie/test.json" ,
result_json_data: str = "./test/temp.json",
batch_size: int = 16,
share_gradio: bool = False,
sample = 64,
seed = 2023

May I ask if my understanding of the two weights is correct, and if there are any errors in my training or evaluating steps or parameters? Thanks for your assistance!!

SAI990323 commented 1 year ago

For the first question, yes the two weights are the same lora weights which are obtained through instruct tuning. For the second question, it seems that you chose the LLaMA2 as your base model and our lora weights are based on LLaMA

xmttttt commented 1 year ago

Thank you very much for your help. I used the llama model of decapoda in Huggingface to try rec tuning and evaluate the results again, and there was a significant improvement in performance on the movie dataset with a sample size of 64 compared to before (about 0.38, or 0.62), but there is still a certain gap between the results and the paper results. I suspect it is still due to the different weights of llama. Could you publicly disclose the llama weight (repositary name or link) that you used? Thanks again!

SAI990323 commented 1 year ago

We obtained the download link from the Meta authorized source. In my impression, there will be various problems when using decapoda/llama-hf-7b to replace llama. Perhaps you could consider applying to the official Meta organization to acquire the model weights.

xmttttt commented 1 year ago

Thanks! Problem Solved!

fine1123 commented 1 year ago

Thanks! Problem Solved!

Hello, I would like to ask you how to download the llama weights，which can solve the problem. I applied to the official Meta, but no response. Thanks you!

xmttttt commented 1 year ago

Hello, I would like to ask you how to download the llama weights, which can solve the problem. I applied to the official Meta, but no response. Thanks you!

Sorry.. I didn't get the weights either. Acturally, I found that the true problem refers to the version of peft mentioned in previous issues. I found the download link for peft==0.3.0.dev0 within this issue, which solves the whole problem. Hope this will help!

fine1123 commented 1 year ago

Sorry.. I didn't get the weights either. Acturally, I found that the true problem refers to the version of peft mentioned in previous issues. I found the download link for peft==0.3.0.dev0 within this issue, which solves the whole problem. Hope this will help!

Thank you very much! I have also solved the problem.

YuZhang10 commented 11 months ago

Thanks! Problem Solved!

Hi, may i ask which base-model do you use? decapoda/llama-hf-7b or llama-2-7b? Thanks in advance! I applied from meta website and it seems that meta only provider llama2 weight now.

fine1123 commented 11 months ago

Hi, may i ask which base-model do you use? decapoda/llama-hf-7b or llama-2-7b? Thanks in advance! I applied from meta website and it seems that meta only provider llama2 weight now.

I use decapoda/llama-hf-7b , downloaded from HuggingFace.

SAI990323 commented 10 months ago

We use the official checkpoint of llama (not llama2) from Meta, it seems that there is something different between decapoda/llama-hf-7b and llama.

Findgod @.***> 于2023年12月26日周二 13:49写道：

Hi, may i ask which base-model do you use? decapoda/llama-hf-7b or llama-2-7b? Thanks in advance! I applied from meta website and it seems that meta only provider llama2 weight now.

I use decapoda/llama-hf-7b , downloaded from HuggingFace.

— Reply to this email directly, view it on GitHub https://github.com/SAI990323/TALLRec/issues/34#issuecomment-1869276262, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJVHP2K7CEFAMNKEKJESAILYLJQQBAVCNFSM6AAAAAA6HBYGA6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRZGI3TMMRWGI . You are receiving this because you commented.Message ID: @.***>

ljy2222 commented 9 months ago

Thanks! Problem Solved!

Hi, can you reproduce the results under different samples (i.e., 16, 64, and 256)?

I still see a gap between the AUC~

fine1123 commented 9 months ago

Hi, can you reproduce the results under different samples (i.e., 16, 64, and 256)?

I still see a gap between the AUC~

I use different samples: 16, 64, 256 and num_epochs: 200 { "movie": { "movie": { "lora_alpaca": { "6": { "64": 0.6498338790446843, "16": 0.536462345916795, "256": 0.7050369960195171 } } } } }

SAI990323 commented 9 months ago

For our experiment, the smaller the sample number, the greater the variation. The results we reported in the experiment are the average results of running three seeds: 0, 1, and 2.

SAI990323 / TALLRec

Questions about reproducing #34