OOM during Training Preparation for Llama

Thank you for your help at https://github.com/Jyonn/ONCE/issues/4. I am now able to successfully run the code. However, I constantly encounter out-of-memory errors during Training Preparation, even when using an A100 with 80GB. I've tried setting both the batch_size and max_item_batch_size to 1, but it seems that the actual running batch size remains at 512.

My config: [00:00:00] |Worker| { "embed": { "name": "llama-token", "embeddings": [ { "vocab_name": "llama", "vocab_type": "numpy", "path": "llama_emb.npy", "frozen": true } ] }, "model": { "name": "LLAMA-NRMS.D64.L0.Lora0", "meta": { "item": "Llama", "user": "Attention", "predictor": "Dot" }, "config": { "use_neg_sampling": true, "use_item_content": true, "max_item_content_batch_size": 0, "same_dim_transform": false, "embed_hidden_size": 4096, "hidden_size": 64, "neg_count": 4, "item_config": { "llm_dir": "/mnt/data_large/ccy/Llama-2-7b-hf", "layer_split": 0, "lora": 0,
"weights_dir": "data/MIND-small-Llama/llama-7b-split" }, "user_config": { "num_attention_heads": 8, "inputer_config": { "use_cls_token": false, "use_sep_token": false } } } }, "exp": { "name": "test_llm_layer_split", "dir": "saving/MIND-small-Llama/LLAMA-NRMS.D64.L0.Lora0/llama-token-test_llm_layer_split", "log": "saving/MIND-small-Llama/LLAMA-NRMS.D64.L0.Lora0/llama-token-test_llm_layer_split/exp.log", "mode": "test_llm_layer_split", "store": { "layers": [ 31, 30, 29, 27 ], "dir": "data/MIND-small-Llama/llama-7b-split" }, "load": { "save_dir": null, "model_only": true, "strict": true, "wait": false }, "policy": { "device": "gpu", "batch_size": 1 } },
"data": { "name": "MIND-small-Llama", "base_dir": "data/MIND-small", "item": { "filter_cache": true, "depot": "data/MIND-small/news", "order": [ "title-llama", "cat-llama" ], "append": [ "nid" ], "lm_col": "title-llama" }, "user": { "filter_cache": true, "depots": { "train": { "path": "data/MIND-small/train" }, "dev": { "path": "data/MIND-small/dev" }, "test": { "path": "data/MIND-small/test" } }, "filters": { "history": [ "x" ] }, "union": [ "data/MIND-small/user" ], "candidate_col": "nid", "clicks_col": "history", "label_col": "click", "neg_col": "neg", "group_col": "imp", "user_col": "uid", "index_col": "index" } }, "version": "small", "llm_ver": "7b", "hidden_size": 64, "layer": 0, "lora": 0, "fast_eval": 0, "embed_hidden_size": 4096, "max_news_batch_size": 1, "max_item_batch_size": 1, "batch_size": 1, "warmup": 0, "simple_dev": false, "acc_batch": 1, "lora_r": 32, "lr": 0.0001, "item_lr": 1e-05, "mind_large_submission": false, "epoch_batch": 0, "page_size": 512, "patience": 2, "epoch_start": 0, "frozen": true, "load_path": null, "rand": {}, "time": {}, "seed": 2023 }

Hi,

You may add max_item_content_batch_size: 512 to the model config, not exp config. Batch size can be larger like 64.

The max_item_content_batch_size is 32. But I still experience OOM. Then I print the shape of hidden_states, and it is torch.Size([512, 33, 4096]) but not 32.

name: LLAMA-NRMS.D${model.config.hidden_size}.L${model.config.item_config.layer_split}.Lora${model.config.item_config.lora} meta: item: Llama user: Attention predictor: Dot config: use_neg_sampling: true use_item_content: true max_item_content_batch_size: 32

Hi,

Sorry, please add --page_size 64 in the command. I will fix this issue as soon as possible in the documentation. I also found that ColPromptMap in the natural_concat_inputer did not support current columns like title-llama. Now I have fixed this issue and you can update the code via git pull. I hope these issues did not affect your research progress.

Thanks!

[00:00:00] |GPU| choose 0 GPU with 81033 / 81920 MB
[00:00:00] |Controller| dataset type:  news
[00:00:00] |Controller| build column map ...
loaded 1368829 samples from data/MIND-small/test
[00:00:01] |CachingDep| load 1 filter caches on 
        UniDep (2.0): data/MIND-small/test

        Sample Size: 1368829
        Id Column: index
        Columns:
                index, vocab index (size 1368829)
                imp, vocab imp (size 36576)
                uid, vocab uid (size 94057)
                nid, vocab nid (size 65238)
                click, vocab click (size 2)

loaded 1372169 samples from data/MIND-small/dev
modify sample_size to 94057
loaded 94057 samples from data/MIND-small/user
[00:00:02] |Depots| Filter history with x in test phase, sample num: 1368829 -> 1328885
loaded 65238 samples from data/MIND-small/news
[00:00:03] |Controller| Selected Item Encoder: LlamaOperator
[00:00:03] |Controller| Selected User Encoder: AdaOperator
[00:00:03] |Controller| Selected Predictor: DotPredictor
[00:00:03] |Controller| Use Negative Sampling: True
[00:00:03] |Controller| Use Item Content: True
[00:00:03] |EmbeddingHub| load pretrained embedding llama of torch.Size([32000, 4096])
[00:00:03] |EmbeddingHub| skip col history
[00:00:03] |EmbeddingHub| create vocab __cat_inputer_special_ids (3, 4096)
[00:00:03] |EmbeddingHub| create vocab __flatten_seq_special_ids (4, 4096)
[00:00:03] |EmbeddingHub| build mapping title-llama -> llama
[00:00:03] |EmbeddingHub| load frozen vocab: llama torch.Size([32000, 4096])
[00:00:03] |EmbeddingHub| keep transform size 4096
[00:00:03] |EmbeddingHub| build mapping cat-llama -> llama
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00,  4.44s/it]
Some weights of the model checkpoint at /home/data1/qijiong/llama-7b were not used when initializing LlamaModel: ['lm_head.weight']
- This IS expected if you are initializing LlamaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LlamaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65238/65238 [00:04<00:00, 15429.18it/s]
[00:01:01] |Worker| {'index': 0, 'imp': 0, 'uid': 50000, 'nid': 62578, 'click': 0, 'history': [6892, 30443, 14245, 29198, 28698, 16460, 1609, 27048, 31725, 27553, 11722, 23871, 27080, 1570, 32953], 'neg': [62578, 44281, 38865, 31175, 63269, 34829, 62656, 32894, 62684, 36788, 37783, 34900, 30050, 39288, 38490, 62633, 37183, 6443, 63631, 64298, 21738]}
[00:01:01] |Worker| {
    "history": {
        "input_ids": {
            "natural_cat": "tensor([30, 33], dtype=torch.int64)"
        },
        "attention_mask": "tensor([30, 33], dtype=torch.int64)"
    },
    "nid": {
        "input_ids": {
            "natural_cat": "tensor([5, 33], dtype=torch.int64)"
        },
        "attention_mask": "tensor([5, 33], dtype=torch.int64)"
    },
    "click": "int",
    "imp": "int",
    "uid": "int64",
    "__clicks_mask__": "tensor([30], dtype=torch.int64)"
}
 27%|███████████████████████████████████████████████████▎                                                                                                                                        | 17792/65238 [02:13<07:10, 110.31it/s]

Hi, I have successfully run it now. However, my current results are slightly lower than those reported in the paper. Do you have any suggestions regarding the configuration?

python worker.py --data config/data/mind-llama.yaml --embed config/embed/llama-token.yaml --model config/model/llm/llama-nrms.yaml --exp config/exp/tt-llm.yaml --embed_hidden_size 4096 --llm_ver 7b --layer 31 --version small --lr 0.0001 --item_lr 0.00001 --batch_size 64 --acc_batch 1 --epoch_batch -4

[00:15:34] |Worker| [epoch 9] GAUC 0.6651 25%|█████████████████████████████ | 905/3618 [00:42<02:08, 21.12it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65238/65238 [00:00<00:00, 88449.45it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 94057/94057 [00:03<00:00, 24319.99it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20769/20769 [00:29<00:00, 699.61it/s] [00:16:59] |Worker| [epoch 10] GAUC 0.6629 [00:16:59] |Monitor| Early Stop [00:16:59] |Worker| load model from exp saving/MIND-small-Llama/LLAMA-NRMS.D64.L31.Lora1/llama-token-train_test/epoch_8.bin 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 65238/65238 [00:00<00:00, 84154.68it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 94057/94057 [00:03<00:00, 23998.46it/s] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20764/20764 [00:30<00:00, 686.64it/s] [00:18:09] |Worker| GAUC: 0.6734 [00:18:09] |Worker| MRR: 0.3412 [00:18:09] |Worker| NDCG@1: 0.1924 [00:18:09] |Worker| NDCG@5: 0.3566 [00:18:09] |Worker| NDCG@10: 0.4188

Hi, there are several hyperparameters that can be tuned:

Model agnostic parameters:

learning rate for item encoder (LLM)
learning rate for other components: suggested 1e-3
epoch batch: you can try {-4, -2, 0} to involve more training data for each epoch
hidden size: default 64, suggested 256

Model specific parameters:

Please refer to config/model/llm/llama-nrms.yaml and modify the hyperparameters of the attention module

Jyonn / Legommenders

OOM during Training Preparation for Llama #6