tensor shape match error

The size of tensor a (0) must match the size of tensor b (1536) at non-singleton dimension 1 Error only if i change gradient_checkpointing into True, and that waste lots of memory

environment as follows: cuda==11.7(limited by machine) python==3.9 pytorch==2.0.1(limited by cuda) deepspeed==0.14.2 transformers==4.41.1 lightning==2.3.0 (lightning==2.4.0 need torch<4.0,>=2.1.0) wheel==0.44.0 flash-attn==2.6.3 (False，limited by machine) fbgemm-gpu==0.5.0 sentencepiece==0.2.0 pandas==2.2.3 colorlog==6.9.0 tensorboardX==2.6.2.2 tensorflow_cpu==2.8.0 colorama==0.4.6 torch_geometric==2.5.3 scikit-learn==1.5.2 protobuf==3.20

ERROR LOG:

++ date +%FT%T

start_time=2024-10-31T18:29:49
[[ '' == '' ]]
[[ '' == '' ]]
nnodes=1
node_rank=0
master_port=12345 ++ nvidia-smi --list-gpus ++ wc -l
nproc_per_node=1
torchrun --master_port=12345 --node_rank=0 --nproc_per_node=1 --nnodes=1 run.py --config_file overall/LLM_deepspeed.yaml HLLM/HLLM.yaml --MAX_ITEM_LIST_LENGTH 3 --epochs 5 --optim_args.learning_rate 1e-4 --MAX_TEXT_LENGTH 3 --train_batch_size 2 [2024-10-31 18:29:55,303] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [93m [WARNING] [0m async_io requires the dev libaio .so object and headers but these were not found. [93m [WARNING] [0m async_io: please install the libaio-devel package with yum [93m [WARNING] [0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [93m [WARNING] [0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH [93m [WARNING] [0m NVIDIA Inference is only supported on Ampere and newer architectures [93m [WARNING] [0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0 [93m [WARNING] [0m using untested triton version (2.0.0), only 1.0.0 is known to be compatible 31 Oct 18:30 INFO Update text_path to /data/home/xconnorwang/HLLM/information/Pixel200K.csv 31 Oct 18:30 INFO Loading <class 'REC.data.dataload.Data'> from scratch with self.data_split = None. 31 Oct 18:30 INFO Interaction feature loaded successfully from [../dataset/Pixel200K.csv]. 31 Oct 18:30 INFO self.user_num = 200001 self.item_num = 96283 31 Oct 18:30 INFO self.inter_feat['item_id'].isna().any() = False self.inter_feat['user_id'].isna().any() = False 31 Oct 18:30 INFO build Pixel200K dataload 31 Oct 18:30 INFO Use random sample True for mask id 31 Oct 18:30 INFO Text path: /data/home/xconnorwang/HLLM/information/Pixel200K.csv 31 Oct 18:30 INFO Text keys: ['title', 'tag', 'description'] 31 Oct 18:30 INFO Item prompt: Compress the following sentence into embedding: 31 Oct 18:30 INFO Text Item num: 96281 31 Oct 18:30 INFO [Training]: train_batch_size = [2] 31 Oct 18:30 INFO [Evaluation]: eval_batch_size = [3] /data/home/xconnorwang/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 11 worker processes in total. Our suggested max number of worker in current system is 10, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( len(train_loader) = 408043 31 Oct 18:30 INFO create item llm 31 Oct 18:30 INFO create LLM ../item_pretrain 31 Oct 18:30 INFO hf_config: LlamaConfig { "_name_or_path": "../item_pretrain", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 0, "eos_token_id": 0, "hidden_act": "silu", "hidden_size": 576, "initializer_range": 0.02, "intermediate_size": 1536, "max_position_embeddings": 2048, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 9, "num_hidden_layers": 30, "num_key_value_heads": 3, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.41.1", "use_cache": true, "vocab_size": 49152 }

31 Oct 18:30 INFO xxxxx starting loading checkpoint 31 Oct 18:30 INFO Using flash attention False for llama 31 Oct 18:30 INFO Init True for llama You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model. 31 Oct 18:30 INFO nce thres setting to 0.99 31 Oct 18:30 INFO item_emb_tokens torch.Size([1, 1, 576]) True 31 Oct 18:30 INFO logit_scale torch.Size([]) True 31 Oct 18:30 INFO item_llm.model.embed_tokens.weight torch.Size([49152, 576]) True 31 Oct 18:30 INFO item_llm.model.layers.0.self_attn.q_proj.weight torch.Size([576, 576]) True 31 Oct 18:30 INFO item_llm.model.layers.0.self_attn.k_proj.weight torch.Size([192, 576]) True 31 Oct 18:30 INFO item_llm.model.layers.0.self_attn.v_proj.weight torch.Size([192, 576]) True 31 Oct 18:30 INFO item_llm.model.layers.0.self_attn.o_proj.weight torch.Size([576, 576]) True 31 Oct 18:30 INFO item_llm.model.layers.0.mlp.gate_proj.weight torch.Size([1536, 576]) True 31 Oct 18:30 INFO item_llm.model.layers.0.mlp.up_proj.weight torch.Size([1536, 576]) True 31 Oct 18:30 INFO item_llm.model.layers.0.mlp.down_proj.weight torch.Size([576, 1536]) True 31 Oct 18:30 INFO item_llm.model.layers.0.input_layernorm.weight torch.Size([576]) True 31 Oct 18:30 INFO item_llm.model.layers.0.post_attention_layernorm.weight torch.Size([576]) True 31 Oct 18:30 INFO item_llm.model.layers.1.self_attn.q_proj.weight torch.Size([576, 576]) True 31 Oct 18:30 INFO item_llm.model.layers.1.self_attn.k_proj.weight torch.Size([192, 576]) True 31 Oct 18:30 INFO item_llm.model.layers.1.self_attn.v_proj.weight torch.Size([192, 576]) True ... 31 Oct 18:30 INFO user_llm.model.layers.26.self_attn.v_proj.weight torch.Size([192, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.26.self_attn.o_proj.weight torch.Size([576, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.26.mlp.gate_proj.weight torch.Size([1536, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.26.mlp.up_proj.weight torch.Size([1536, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.26.mlp.down_proj.weight torch.Size([576, 1536]) True 31 Oct 18:30 INFO user_llm.model.layers.26.input_layernorm.weight torch.Size([576]) True 31 Oct 18:30 INFO user_llm.model.layers.26.post_attention_layernorm.weight torch.Size([576]) True 31 Oct 18:30 INFO user_llm.model.layers.27.self_attn.q_proj.weight torch.Size([576, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.27.self_attn.k_proj.weight torch.Size([192, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.27.self_attn.v_proj.weight torch.Size([192, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.27.self_attn.o_proj.weight torch.Size([576, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.27.mlp.gate_proj.weight torch.Size([1536, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.27.mlp.up_proj.weight torch.Size([1536, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.27.mlp.down_proj.weight torch.Size([576, 1536]) True 31 Oct 18:30 INFO user_llm.model.layers.27.input_layernorm.weight torch.Size([576]) True 31 Oct 18:30 INFO user_llm.model.layers.27.post_attention_layernorm.weight torch.Size([576]) True 31 Oct 18:30 INFO user_llm.model.layers.28.self_attn.q_proj.weight torch.Size([576, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.28.self_attn.k_proj.weight torch.Size([192, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.28.self_attn.v_proj.weight torch.Size([192, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.28.self_attn.o_proj.weight torch.Size([576, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.28.mlp.gate_proj.weight torch.Size([1536, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.28.mlp.up_proj.weight torch.Size([1536, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.28.mlp.down_proj.weight torch.Size([576, 1536]) True 31 Oct 18:30 INFO user_llm.model.layers.28.input_layernorm.weight torch.Size([576]) True 31 Oct 18:30 INFO user_llm.model.layers.28.post_attention_layernorm.weight torch.Size([576]) True 31 Oct 18:30 INFO user_llm.model.layers.29.self_attn.q_proj.weight torch.Size([576, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.29.self_attn.k_proj.weight torch.Size([192, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.29.self_attn.v_proj.weight torch.Size([192, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.29.self_attn.o_proj.weight torch.Size([576, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.29.mlp.gate_proj.weight torch.Size([1536, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.29.mlp.up_proj.weight torch.Size([1536, 576]) True 31 Oct 18:30 INFO user_llm.model.layers.29.mlp.down_proj.weight torch.Size([576, 1536]) True 31 Oct 18:30 INFO user_llm.model.layers.29.input_layernorm.weight torch.Size([576]) True 31 Oct 18:30 INFO user_llm.model.layers.29.post_attention_layernorm.weight torch.Size([576]) True 31 Oct 18:30 INFO user_llm.model.norm.weight torch.Size([576]) True 31 Oct 18:30 INFO
World_Size = 1

31 Oct 18:30 INFO
General Hyper Parameters: seed = 2020 state = INFO use_text = True reproducibility = True checkpoint_dir = saved show_progress = True log_wandb = False data_path = ../dataset/ strategy = deepspeed precision = bf16-mixed model = HLLM

Training Hyper Parameters: epochs = 5 train_batch_size = 2 optim_args = {'learning_rate': 0.0001, 'weight_decay': 0.01} eval_step = 1 stopping_step = 5

Evaluation Hyper Parameters: eval_batch_size = 3 topk = [5, 10, 50, 200] metrics = ['Recall', 'NDCG'] valid_metric = NDCG@200 metric_decimal_place = 7 eval_type = EvaluatorType.RANKING valid_metric_bigger = True

Dataset Hyper Parameters: MAX_ITEM_LIST_LENGTH = 3 MAX_TEXT_LENGTH = 3 text_keys = ['title', 'tag', 'description'] item_prompt = Compress the following sentence into embedding:

Other Hyper Parameters: wandb_project = REC text_path = /data/home/xconnorwang/HLLM/information/Pixel200K.csv item_emb_token_n = 1 loss = nce scheduler_args = {'type': 'cosine', 'warmup': 0.1} stage = 3 gradient_checkpointing = True zero3_init_flag = False item_pretrain_dir = ../item_pretrain item_llm_init = True user_pretrain_dir = ../user_pretrain user_llm_init = True use_ft_flash_attn = False MODEL_INPUT_TYPE = InputType.SEQ device = cuda:0

31 Oct 18:30 INFO Pixel200K The number of users: 200001 Average actions of users: 19.82828 The number of items: 96283 Average actions of items: 41.187927130720176 The number of inters: 3965656 The sparsity of the dataset: 99.9794063532928% 31 Oct 18:30 INFO HLLM( (item_llm): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(49152, 576) (layers): ModuleList( (0-29): 30 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=576, out_features=576, bias=False) (k_proj): Linear(in_features=576, out_features=192, bias=False) (v_proj): Linear(in_features=576, out_features=192, bias=False) (o_proj): Linear(in_features=576, out_features=576, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=576, out_features=1536, bias=False) (up_proj): Linear(in_features=576, out_features=1536, bias=False) (down_proj): Linear(in_features=1536, out_features=576, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=576, out_features=49152, bias=False) ) (user_llm): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(49152, 576) (layers): ModuleList( (0-29): 30 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=576, out_features=576, bias=False) (k_proj): Linear(in_features=576, out_features=192, bias=False) (v_proj): Linear(in_features=576, out_features=192, bias=False) (o_proj): Linear(in_features=576, out_features=576, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=576, out_features=1536, bias=False) (up_proj): Linear(in_features=576, out_features=1536, bias=False) (down_proj): Linear(in_features=1536, out_features=576, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=576, out_features=49152, bias=False) ) ) Trainable parameters: 269030593.0 31 Oct 18:30 INFO Use consine scheduler with 204021.5 warmup 2040215 total steps 31 Oct 18:30 INFO Use deepspeed strategy initializing deepspeed distributed: GLOBAL_RANK: 0, MEMBER: 1/1 Enabling DeepSpeed BF16. Model parameters and inputs will be cast to bfloat16. 31 Oct 18:30 INFO Added key: store_based_barrier_key:2 to store for rank: 0 31 Oct 18:30 INFO Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes. Parameter Offload: Total persistent parameters: 70849 in 124 params

Train [ 0/ 5]: 0%| | 0/408043 [00:00<?, ?it/s]/data/home/xconnorwang/.local/lib/python3.9/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 11 worker processes in total. Our suggested max number of worker in current system is 10, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( elem_shape: torch.Size([4]) elem: tensor([35434, 40551, 37030, 59065]) batch_len: 2 batch: [tensor([35434, 40551, 37030, 59065]), tensor([12579, 4861, 4391, 11309])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel) /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4]) elem: elem_shape: torch.Size([4]) elem: tensor([87871, 81222, 81198, 22876]) batch_len: 2 batch: [tensor([87871, 81222, 81198, 22876]), tensor([48319, 60427, 57998, 62150])] elem_shape: torch.Size([3]) elem: tensor([64465, 17314, 64780, 42073]) batch_len: 2 batch: tensor([1, 1, 1]) batch_len: 2 batch: [tensor([64465, 17314, 64780, 42073]), tensor([30084, 36569, 75655, 9662])] [tensor([1, 1, 1]), tensor([1, 1, 1])] elem_shape: torch.Size([4]) elem_shape: elem: torch.Size([4]) elem: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4, 6]) elem: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel) tensor([[2021, 5, 11, 22, 21, 22], [2021, 6, 26, 3, 21, 25], [2021, 6, 26, 19, 6, 11], [2021, 7, 2, 3, 9, 1]]) batch_len: 2 batch: tensor([ 425, 14391, 328, 17107]) batch_len: 2 batch: tensor([ 847, 39123, 15991, 22791]) batch_len: 2 batch: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4])[tensor([ 425, 14391, 328, 17107]), tensor([ 391, 13496, 15739, 5500])] elem: [tensor([ 847, 39123, 15991, 22791]), tensor([41123, 14041, 23611, 73814])] tensor([51018, 50191, 91965, 58533]) batch_len: 2 batch: [tensor([[2021, 5, 11, 22, 21, 22], [2021, 6, 26, 3, 21, 25], [2021, 6, 26, 19, 6, 11], [2021, 7, 2, 3, 9, 1]]), tensor([[2019, 4, 2, 16, 6, 14], [2019, 5, 13, 15, 55, 25], [2019, 5, 27, 15, 25, 55], [2020, 3, 13, 6, 15, 16]])] [tensor([51018, 50191, 91965, 58533]), tensor([44203, 41980, 80114, 74846])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel) /data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel) /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: 2 batch: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4])elem_shape: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem: elem_shape: torch.Size([4])torch.Size([4]) [tensor([1, 1, 1]), tensor([1, 1, 1])]elem: elem:
tensor([75806, 61616, 23817, 95456]) batch_len: 2 batch: tensor([71534, 69682, 37869, 58014]) batch_len: 2 /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) batch: elem_shape: elem_shape: torch.Size([4, 6]) elem: torch.Size([4]) elem: [tensor([75806, 61616, 23817, 95456]), tensor([53752, 39814, 26241, 94833])] [tensor([71534, 69682, 37869, 58014]), tensor([67918, 40640, 76791, 7968])] tensor([23374, 8848, 3948, 36802]) batch_len: 2tensor([[2020, 8, 23, 14, 40, 54], [2021, 5, 2, 8, 51, 43], [2021, 6, 26, 13, 45, 19], [2021, 7, 23, 7, 49, 36]]) batch: batch_len: 2 batch: elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) tensor([ 2969, 25609, 13490, 14848])batch_len: batch_len: 2 2 batch: batch: [tensor([23374, 8848, 3948, 36802]), tensor([46118, 11192, 20427, 6691])] elem_shape: torch.Size([4]) elem: [tensor([ 2969, 25609, 13490, 14848]), tensor([78124, 39314, 69903, 52117])] [tensor([1, 1, 1]), tensor([1, 1, 1])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4, 6]) elem: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel) [tensor([[2020, 8, 23, 14, 40, 54], [2021, 5, 2, 8, 51, 43], [2021, 6, 26, 13, 45, 19], [2021, 7, 23, 7, 49, 36]]), tensor([[2020, 7, 25, 16, 56, 10], [2020, 8, 31, 10, 45, 23], [2020, 8, 31, 11, 10, 33], [2020, 9, 9, 8, 37, 37]])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel) /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) tensor([18921, 20968, 23234, 15771]) batch_len: 2 batch: tensor([[2020, 5, 16, 9, 20, 33], [2020, 8, 12, 12, 42, 53], [2020, 8, 25, 23, 54, 8], [2020, 12, 9, 8, 38, 4]]) batch_len: 2 batch: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4])elem_shape: elem: torch.Size([4]) elem: [tensor([18921, 20968, 23234, 15771]), tensor([41684, 16594, 21227, 22578])] elem_shape: torch.Size([4]) elem: tensor([ 3645, 46545, 23431, 92630]) batch_len: 2 tensor([ 1778, 14200, 60825, 10431])batch:
batch_len: 2 batch: [tensor([ 3645, 46545, 23431, 92630]), tensor([21302, 48795, 47488, 34331])] [tensor([ 1778, 14200, 60825, 10431]), tensor([82015, 85877, 42269, 53227])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel) [tensor([[2020, 5, 16, 9, 20, 33], [2020, 8, 12, 12, 42, 53], [2020, 8, 25, 23, 54, 8], [2020, 12, 9, 8, 38, 4]]), tensor([[2021, 1, 2, 1, 55, 10], [2021, 1, 4, 12, 57, 38], [2021, 1, 10, 8, 48, 46], [2021, 1, 25, 8, 36, 14]])] elem_shape: torch.Size([3]) elem: tensor([75506, 5080, 325, 80292]) elem_shape: batch_len: 2 torch.Size([3])batch: elem: tensor([1, 1, 1])/data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out)

batch_len: 2 batch: tensor([1, 1, 1]) batch_len: 2 batch: [tensor([75506, 5080, 325, 80292]), tensor([ 8350, 15011, 15842, 12424])] [tensor([1, 1, 1]), tensor([1, 1, 1])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4]) elem: [tensor([1, 1, 1]), tensor([1, 1, 1])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4, 6]) elem: tensor([62601, 73106, 10840, 84653]) batch_len: 2 batch: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4, 6]) elem: [tensor([62601, 73106, 10840, 84653]), tensor([48662, 22713, 69375, 59])]/data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel)

tensor([[2021, 7, 30, 13, 3, 55], [2021, 8, 2, 7, 9, 44], [2021, 10, 7, 3, 53, 40], [2021, 12, 18, 7, 53, 28]]) batch_len: 2 batch: tensor([[2020, 10, 15, 10, 0, 6], [2020, 11, 4, 20, 23, 30], [2020, 11, 9, 20, 0, 49], [2020, 11, 25, 7, 34, 25]]) batch_len: 2elem_shape: batch: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) 2 elem_shape: batch: torch.Size([4]) elem: tensor([36270, 52981, 82544, 71318]) batch_len: 2 batch: [tensor([1, 1, 1]), tensor([1, 1, 1])] [tensor([36270, 52981, 82544, 71318]), tensor([58128, 32419, 83234, 4544])]/data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out)

elem_shape: torch.Size([4, 6]) elem: [tensor([[2020, 10, 15, 10, 0, 6], [2020, 11, 4, 20, 23, 30], [2020, 11, 9, 20, 0, 49], [2020, 11, 25, 7, 34, 25]]), tensor([[2022, 1, 10, 13, 26, 1], [2022, 1, 16, 11, 39, 38], [2022, 1, 16, 12, 32, 54], [2022, 1, 16, 12, 35, 23]])] [tensor([[2021, 7, 30, 13, 3, 55], [2021, 8, 2, 7, 9, 44], [2021, 10, 7, 3, 53, 40], [2021, 12, 18, 7, 53, 28]]), tensor([[2022, 3, 7, 17, 25, 50], [2022, 3, 18, 17, 32, 55], [2022, 3, 24, 12, 58, 29], [2022, 4, 17, 13, 59, 9]])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: tensor([[2019, 10, 10, 11, 54, 1], [2019, 10, 28, 9, 41, 15], [2019, 11, 1, 12, 55, 38], [2019, 11, 2, 7, 20, 17]])2 batch: batch_len: 2 elem_shape: batch: torch.Size([4]) elem: [tensor([1, 1, 1]), tensor([1, 1, 1])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4, 6]) elem: tensor([11709, 58322, 49677, 18878]) batch_len: 2 batch: [tensor([[2019, 10, 10, 11, 54, 1], [2019, 10, 28, 9, 41, 15], [2019, 11, 1, 12, 55, 38], [2019, 11, 2, 7, 20, 17]]), tensor([[2021, 10, 21, 10, 57, 1], [2021, 10, 28, 4, 41, 13], [2021, 10, 31, 14, 23, 40], [2021, 11, 3, 14, 10, 39]])] tensor([[2022, 1, 9, 4, 24, 57], [2022, 1, 11, 15, 1, 33], [2022, 1, 13, 17, 10, 21], [2022, 2, 24, 16, 4, 35]]) batch_len: 2 batch: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) [tensor([11709, 58322, 49677, 18878]), tensor([72797, 27520, 313, 9565])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel) [tensor([[2022, 1, 9, 4, 24, 57], [2022, 1, 11, 15, 1, 33], [2022, 1, 13, 17, 10, 21], [2022, 2, 24, 16, 4, 35]]), tensor([[2021, 9, 11, 8, 53, 52], [2021, 10, 1, 7, 4, 45], [2021, 11, 1, 6, 57, 6], [2021, 11, 2, 12, 50, 55]])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4]) elem: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) tensor([23121, 94384, 64729, 63419]) batch_len: 2 batch: [tensor([23121, 94384, 64729, 63419]), tensor([68192, 22562, 91147, 53581])] elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: 2 batch: [tensor([1, 1, 1]), tensor([1, 1, 1])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4, 6]) elem: elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: 2 batch: tensor([[2017, 6, 15, 15, 24, 56], [2018, 9, 2, 12, 31, 42], [2018, 12, 5, 13, 24, 18], [2018, 12, 30, 11, 41, 17]]) batch_len: 2 batch: [tensor([1, 1, 1]), tensor([1, 1, 1])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4, 6]) elem: [tensor([[2017, 6, 15, 15, 24, 56], [2018, 9, 2, 12, 31, 42], [2018, 12, 5, 13, 24, 18], [2018, 12, 30, 11, 41, 17]]), tensor([[2021, 10, 16, 21, 55, 55], [2021, 10, 17, 11, 25, 49], [2021, 10, 22, 12, 27, 43], [2021, 10, 24, 10, 50, 51]])] tensor([[2021, 8, 13, 1, 4, 58], [2021, 8, 29, 5, 0, 33], [2021, 8, 29, 8, 9, 38], [2021, 9, 15, 4, 46, 37]]) batch_len: 2 batch: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) [tensor([[2021, 8, 13, 1, 4, 58], [2021, 8, 29, 5, 0, 33], [2021, 8, 29, 8, 9, 38], [2021, 9, 15, 4, 46, 37]]), tensor([[2021, 5, 8, 11, 38, 37], [2021, 6, 6, 12, 34, 21], [2021, 9, 4, 12, 5, 10], [2021, 9, 16, 15, 41, 53]])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4]) elem: tensor([38585, 10559, 37311, 1179]) batch_len: 2 batch: [tensor([38585, 10559, 37311, 1179]), tensor([23749, 14811, 19801, 8735])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel) /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4]) elem: tensor([67086, 15614, 41876, 65866]) batch_len: 2 batch: [tensor([67086, 15614, 41876, 65866]), tensor([78779, 84830, 13453, 29267])] elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: 2 batch: [tensor([1, 1, 1]), tensor([1, 1, 1])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4, 6]) elem: tensor([[2018, 11, 25, 8, 42, 38], [2019, 8, 20, 8, 15, 18], [2019, 9, 23, 9, 21, 23], [2019, 10, 15, 4, 48, 33]]) batch_len: 2 batch: [tensor([[2018, 11, 25, 8, 42, 38], [2019, 8, 20, 8, 15, 18], [2019, 9, 23, 9, 21, 23], [2019, 10, 15, 4, 48, 33]]), tensor([[2021, 12, 18, 7, 8, 29], [2022, 1, 23, 2, 26, 47], [2022, 2, 15, 8, 20, 36], [2022, 2, 28, 7, 13, 14]])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: elem_shape: torch.Size([4]) torch.Size([4])elem: elem: tensor([ 6662, 4673, 11970, 3575]) tensor([38929, 55609, 36272, 39413])batch_len: 2batch_len: batch: 2 batch: [tensor([38929, 55609, 36272, 39413]), tensor([10936, 63753, 14848, 14360])] [tensor([ 6662, 4673, 11970, 3575]), tensor([44828, 36185, 87193, 63846])] elem_shape: elem_shape: torch.Size([4]) torch.Size([4])elem: elem: tensor([32135, 93929, 34764, 69886]) batch_len: 2tensor([45581, 24052, 52267, 23602]) batch: batch_len: 2 batch: [tensor([32135, 93929, 34764, 69886]), tensor([54510, 83382, 65314, 46423])] [tensor([45581, 24052, 52267, 23602]), tensor([37862, 78032, 23472, 36266])] elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: 2 batch: elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: 2 batch: [tensor([1, 1, 1]), tensor([1, 1, 1])] elem_shape: torch.Size([4, 6]) elem: [tensor([1, 1, 1]), tensor([1, 1, 1])] elem_shape: torch.Size([4, 6])elem_shape: elem: torch.Size([4]) elem: tensor([25029, 50735, 72231, 82790]) tensor([[2020, 6, 22, 4, 8, 58], [2020, 6, 24, 10, 35, 59], [2021, 4, 25, 13, 5, 40], [2021, 6, 30, 23, 12, 22]])batch_len: 2batch_len: elem_shape: batch: 2 batch: torch.Size([4]) elem: tensor([[2020, 4, 5, 0, 51, 10], [2020, 4, 25, 5, 5, 49], [2020, 4, 30, 14, 29, 13], [2020, 5, 4, 13, 33, 40]]) tensor([ 4706, 45075, 33534, 35891])batch_len: 2batch_len: [tensor([25029, 50735, 72231, 82790]), tensor([36757, 13822, 39746, 12628])]batch: 2 batch: elem_shape: torch.Size([4]) elem: [tensor([ 4706, 45075, 33534, 35891]), tensor([14289, 39310, 54955, 22678])] tensor([ 8623, 49238, 74673, 30410]) batch_len: 2 batch: elem_shape: torch.Size([4]) elem: elem_shape: tensor([85332, 83015, 13114, 16238])[tensor([ 8623, 49238, 74673, 30410]), tensor([ 5683, 50161, 52544, 12805])]

batch_len: 2 batch: [tensor([[2020, 6, 22, 4, 8, 58], [2020, 6, 24, 10, 35, 59], [2021, 4, 25, 13, 5, 40], [2021, 6, 30, 23, 12, 22]]), tensor([[2020, 12, 13, 14, 34, 9], [2020, 12, 15, 10, 28, 9], [2020, 12, 16, 2, 21, 40], [2020, 12, 16, 11, 30, 10]])]torch.Size([4]) elem: elem_shape: [tensor([[2020, 4, 5, 0, 51, 10], [2020, 4, 25, 5, 5, 49], [2020, 4, 30, 14, 29, 13], [2020, 5, 4, 13, 33, 40]]), tensor([[2021, 3, 8, 14, 38, 4], [2021, 4, 3, 13, 49, 1], [2021, 4, 21, 14, 32, 33], [2021, 5, 2, 11, 25, 25]])]torch.Size([3]) elem: [tensor([85332, 83015, 13114, 16238]), tensor([94158, 39764, 91330, 69704])] tensor([ 6869, 8025, 14091, 70717])tensor([1, 1, 1])

batch_len: batch_len: 2 2batch: batch: elem_shape: torch.Size([3]) elem: [tensor([1, 1, 1]), tensor([1, 1, 1])] tensor([1, 1, 1]) batch_len: 2 batch: elem_shape: [tensor([ 6869, 8025, 14091, 70717]), tensor([15109, 31352, 9331, 28637])] torch.Size([4, 6]) elem: [tensor([1, 1, 1]), tensor([1, 1, 1])] elem_shape: torch.Size([4]) elem: elem_shape: torch.Size([4, 6]) elem: tensor([41009, 62101, 75279, 8355]) batch_len: tensor([[2020, 9, 20, 16, 18, 6], [2020, 12, 11, 11, 11, 37], [2021, 6, 13, 14, 34, 1], [2021, 7, 13, 10, 17, 45]]) batch_len: 2 2batch: batch: [tensor([41009, 62101, 75279, 8355]), tensor([10073, 35383, 38809, 62264])]tensor([[2021, 6, 11, 9, 18, 6], [2021, 6, 29, 7, 45, 39], [2021, 7, 1, 5, 32, 32], [2021, 7, 8, 0, 9, 53]])

batch_len: 2 batch: elem_shape: torch.Size([3]) elem: [tensor([[2020, 9, 20, 16, 18, 6], [2020, 12, 11, 11, 11, 37], [2021, 6, 13, 14, 34, 1], [2021, 7, 13, 10, 17, 45]]), tensor([[2020, 4, 25, 6, 12, 24], [2020, 5, 10, 15, 33, 41], [2020, 8, 2, 17, 29, 52], [2021, 1, 30, 13, 57, 19]])] tensor([1, 1, 1]) batch_len: 2 batch: [tensor([[2021, 6, 11, 9, 18, 6], [2021, 6, 29, 7, 45, 39], [2021, 7, 1, 5, 32, 32], [2021, 7, 8, 0, 9, 53]]), tensor([[2020, 4, 5, 8, 51, 56], [2020, 4, 9, 3, 39, 43], [2020, 4, 13, 7, 54, 40], [2020, 4, 15, 8, 2, 15]])] elem_shape: torch.Size([4]) elem: elem_shape: [tensor([1, 1, 1]), tensor([1, 1, 1])] torch.Size([4]) elem: elem_shape: torch.Size([4, 6]) elem: tensor([29506, 3031, 2450, 16728]) batch_len: 2tensor([27363, 17546, 49356, 12892]) batch: batch_len: 2 batch: [tensor([27363, 17546, 49356, 12892]), tensor([58303, 86109, 48931, 5863])] [tensor([29506, 3031, 2450, 16728]), tensor([ 4410, 19712, 7912, 2837])] tensor([[2020, 3, 22, 6, 8, 58], [2020, 3, 23, 15, 57, 24], [2020, 4, 1, 17, 19, 52], [2020, 4, 23, 8, 40, 44]]) batch_len: 2 batch: elem_shape: torch.Size([4]) elem: elem_shape: torch.Size([4]) elem: tensor([41751, 69507, 17961, 37770]) batch_len: 2 batch: tensor([40460, 86051, 1461, 24607]) batch_len: 2 batch: elem_shape: torch.Size([4]) elem: [tensor([41751, 69507, 17961, 37770]), tensor([61301, 94601, 69568, 52381])] [tensor([40460, 86051, 1461, 24607]), tensor([69028, 81627, 80154, 46609])] tensor([10081, 38232, 20732, 2753]) batch_len: 2 batch: elem_shape: torch.Size([3]) elem: elem_shape: torch.Size([3]) elem: [tensor([10081, 38232, 20732, 2753]), tensor([69776, 64923, 40659, 53832])]tensor([1, 1, 1])

batch_len: [tensor([[2020, 3, 22, 6, 8, 58], [2020, 3, 23, 15, 57, 24], [2020, 4, 1, 17, 19, 52], [2020, 4, 23, 8, 40, 44]]), tensor([[2021, 12, 10, 14, 1, 52], [2022, 1, 21, 2, 55, 46], [2022, 1, 30, 9, 57, 52], [2022, 4, 8, 5, 5, 32]])]2 batch: tensor([1, 1, 1]) batch_len: 2 batch: elem_shape: torch.Size([4]) elem: [tensor([1, 1, 1]), tensor([1, 1, 1])] tensor([80933, 54961, 71171, 20766]) batch_len: 2 batch: elem_shape: torch.Size([4, 6])[tensor([1, 1, 1]), tensor([1, 1, 1])] elem: elem_shape: torch.Size([4, 6]) elem: [tensor([80933, 54961, 71171, 20766]), tensor([27471, 12016, 60008, 31466])] tensor([[2021, 4, 16, 18, 52, 24], [2021, 4, 24, 4, 1, 15], [2021, 4, 28, 12, 34, 40], [2021, 5, 27, 14, 53, 32]]) batch_len: 2 batch: elem_shape: torch.Size([3]) elem: tensor([[2022, 1, 19, 8, 21, 4], [2022, 1, 25, 13, 21, 57], [2022, 1, 25, 13, 26, 8], [2022, 1, 25, 14, 37, 19]]) batch_len: 2 batch: tensor([1, 1, 1]) batch_len: 2 batch: [tensor([1, 1, 1]), tensor([1, 1, 1])] elem_shape: torch.Size([4, 6]) elem: [tensor([[2021, 4, 16, 18, 52, 24], [2021, 4, 24, 4, 1, 15], [2021, 4, 28, 12, 34, 40], [2021, 5, 27, 14, 53, 32]]), tensor([[2022, 2, 11, 10, 37, 8], [2022, 2, 14, 9, 58, 15], [2022, 2, 15, 10, 35, 35], [2022, 2, 25, 11, 35, 51]])] [tensor([[2022, 1, 19, 8, 21, 4], [2022, 1, 25, 13, 21, 57], [2022, 1, 25, 13, 26, 8], [2022, 1, 25, 14, 37, 19]]), tensor([[2021, 8, 8, 4, 48, 52], [2021, 8, 9, 11, 59, 3], [2021, 8, 11, 2, 36, 48], [2021, 8, 15, 9, 18, 21]])] tensor([[2019, 12, 21, 0, 20, 2], [2019, 12, 24, 4, 56, 0], [2020, 6, 3, 14, 42, 35], [2020, 6, 7, 12, 34, 55]]) batch_len: 2 batch: elem_shape: torch.Size([4]) elem: [tensor([[2019, 12, 21, 0, 20, 2], [2019, 12, 24, 4, 56, 0], [2020, 6, 3, 14, 42, 35], [2020, 6, 7, 12, 34, 55]]), tensor([[2021, 8, 5, 16, 34, 44], [2021, 8, 10, 10, 42, 13], [2021, 8, 15, 5, 23, 11], [2021, 8, 25, 5, 38, 34]])] tensor([25996, 21510, 15073, 22007]) batch_len: 2 batch: [tensor([25996, 21510, 15073, 22007]), tensor([20878, 33688, 13673, 23473])] elem_shape: torch.Size([4]) elem: tensor([54102, 80143, 42441, 14827]) batch_len: 2 batch: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collate_fn.py:43: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() storage = elem.storage()._new_shared(numel) [tensor([54102, 80143, 42441, 14827]), tensor([64859, 75260, 51319, 27606])] elem_shape: torch.Size([4]) elem: /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [8], which does not match the required output shape [2, 4]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4]) elem: tensor([89791, 90644, 92446, 54050]) batch_len: 2 batch: tensor([65655, 37454, 50466, 48762]) batch_len: 2 batch: [tensor([89791, 90644, 92446, 54050]), tensor([56544, 83106, 41060, 1607])] [tensor([65655, 37454, 50466, 48762]), tensor([60052, 200, 61482, 89754])] elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: 2 batch: elem_shape: torch.Size([3]) elem: [tensor([1, 1, 1]), tensor([1, 1, 1])] tensor([1, 1, 1]) batch_len: 2 batch: elem_shape: torch.Size([4, 6]) elem: [tensor([1, 1, 1]), tensor([1, 1, 1])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [6], which does not match the required output shape [2, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4, 6]) elem: tensor([[2021, 6, 4, 10, 56, 45], [2021, 6, 9, 4, 12, 42], [2021, 6, 13, 8, 18, 31], [2021, 7, 10, 15, 46, 59]]) batch_len: 2 batch: tensor([[2021, 5, 26, 8, 51, 21], [2021, 6, 2, 5, 45, 4], [2021, 6, 2, 10, 47, 17], [2021, 6, 14, 23, 51, 24]]) batch_len: 2 batch: [tensor([[2021, 6, 4, 10, 56, 45], [2021, 6, 9, 4, 12, 42], [2021, 6, 13, 8, 18, 31], [2021, 7, 10, 15, 46, 59]]), tensor([[2018, 11, 16, 15, 22, 34], [2019, 4, 6, 18, 8, 30], [2019, 4, 19, 9, 34, 45], [2019, 4, 30, 13, 9, 4]])] [tensor([[2021, 5, 26, 8, 51, 21], [2021, 6, 2, 5, 45, 4], [2021, 6, 2, 10, 47, 17], [2021, 6, 14, 23, 51, 24]]), tensor([[2020, 8, 29, 12, 27, 53], [2020, 9, 4, 3, 29, 46], [2020, 9, 16, 12, 10, 11], [2020, 9, 16, 12, 15, 40]])] /data/home/xconnorwang/HLLM/code/REC/data/dataset/collatefn.py:45: UserWarning: An output with one or more elements was resized since it had shape [48], which does not match the required output shape [2, 4, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:26.) return torch.stack(batch, 0, out=out) elem_shape: torch.Size([4]) elem: tensor([54514, 27056, 33162, 20521]) batch_len: 2 batch: elem_shape: torch.Size([4]) elem: [tensor([54514, 27056, 33162, 20521]), tensor([70133, 32499, 25117, 42046])] elem_shape: torch.Size([4]) tensor([49495, 10004, 10945, 30683])elem:
batch_len: 2 batch: tensor([95307, 57247, 49990, 17365]) batch_len: 2 batch: [tensor([95307, 57247, 49990, 17365]), tensor([75868, 79781, 39034, 12100])] [tensor([49495, 10004, 10945, 30683]), tensor([ 2930, 15228, 46649, 36206])] elem_shape: torch.Size([3]) elem: elem_shape: torch.Size([4]) elem: tensor([1, 1, 1]) batch_len: 2 batch: tensor([75048, 81460, 72950, 55131]) batch_len: 2 batch: [tensor([1, 1, 1]), tensor([1, 1, 1])] [tensor([75048, 81460, 72950, 55131]), tensor([76877, 71803, 95754, 64344])] elem_shape: torch.Size([4, 6]) elem: elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: 2 batch: [tensor([1, 1, 1]), tensor([1, 1, 1])] tensor([[2020, 9, 25, 5, 37, 22], [2020, 12, 16, 14, 21, 19], [2021, 4, 7, 11, 43, 34], [2021, 4, 12, 10, 31, 36]]) batch_len: 2 batch: elem_shape: torch.Size([4, 6]) elem: tensor([[2021, 2, 26, 14, 46, 20], [2021, 4, 5, 12, 24, 38], [2021, 4, 5, 13, 3, 52], [2021, 5, 15, 17, 1, 36]]) batch_len: 2 batch: [tensor([[2020, 9, 25, 5, 37, 22], [2020, 12, 16, 14, 21, 19], [2021, 4, 7, 11, 43, 34], [2021, 4, 12, 10, 31, 36]]), tensor([[2019, 2, 10, 7, 8, 58], [2019, 2, 10, 7, 26, 39], [2019, 4, 3, 10, 11, 41], [2019, 4, 11, 9, 40, 53]])] [tensor([[2021, 2, 26, 14, 46, 20], [2021, 4, 5, 12, 24, 38], [2021, 4, 5, 13, 3, 52], [2021, 5, 15, 17, 1, 36]]), tensor([[2021, 8, 30, 10, 47, 40], [2021, 9, 28, 12, 1, 47], [2021, 9, 29, 16, 14, 55], [2021, 10, 1, 7, 18, 49]])] elem_shape: torch.Size([4]) elem: tensor([45360, 536, 48045, 26275]) batch_len: 2 batch: [tensor([45360, 536, 48045, 26275]), tensor([20411, 11698, 21909, 15291])] elem_shape: torch.Size([4]) elem: tensor([79237, 53705, 87849, 26910]) batch_len: 2 batch: [tensor([79237, 53705, 87849, 26910]), tensor([13761, 19517, 61800, 59291])] elem_shape: torch.Size([3]) elem: tensor([1, 1, 1]) batch_len: 2 batch: [tensor([1, 1, 1]), tensor([1, 1, 1])] elem_shape: torch.Size([4, 6]) elem: tensor([[2022, 2, 11, 1, 20, 13], [2022, 2, 11, 1, 23, 9], [2022, 2, 11, 2, 3, 3], [2022, 2, 11, 6, 13, 31]]) batch_len: 2 batch: [tensor([[2022, 2, 11, 1, 20, 13], [2022, 2, 11, 1, 23, 9], [2022, 2, 11, 2, 3, 3], [2022, 2, 11, 6, 13, 31]]), tensor([[2020, 7, 18, 15, 24, 36], [2020, 8, 2, 16, 48, 4], [2020, 10, 10, 17, 13, 9], [2020, 11, 24, 6, 57, 41]])] Traceback (most recent call last): File "/data/home/xconnorwang/HLLM/code/run.py", line 139, in run_loop(local_rank=local_rank, config_file=config_file, extra_args=extra_args) File "/data/home/xconnorwang/HLLM/code/run.py", line 110, in run_loop best_valid_score, best_valid_result = trainer.fit( File "/data/home/xconnorwang/HLLM/code/REC/trainer/trainer.py", line 342, in fit train_loss = self._train_epoch(train_data, epoch_idx, show_progress=show_progress) File "/data/home/xconnorwang/HLLM/code/REC/trainer/trainer.py", line 198, in _train_epoch self.lite.backward(losses) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/lightning/fabric/fabric.py", line 446, in backward self._strategy.backward(tensor, module, *args, kwargs) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/lightning/fabric/strategies/strategy.py", line 188, in backward self.precision.backward(tensor, module, *args, *kwargs) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/lightning/fabric/plugins/precision/deepspeed.py", line 91, in backward model.backward(tensor, args, kwargs) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, *kwargs) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1976, in backward self.optimizer.backward(loss, retain_graph=retain_graph) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, **kwargs) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 2213, in backward self.loss_scaler.backward(loss.float(), retain_graph=retain_graph) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward scaled_loss.backward(retain_graph=retain_graph) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/torch/autograd/init.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: The size of tensor a (0) must match the size of tensor b (1536) at non-singleton dimension 1

Train [ 0/ 5]: 0%| | 0/408043 [00:05<?, ?it/s] ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1073205) of binary: /usr/local/python3/bin/python3.9 Traceback (most recent call last): File "/data/home/xconnorwang/.local/bin/torchrun", line 8, in sys.exit(main()) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/torch/distributed/run.py", line 794, in main run(args) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run elastic_launch( File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/data/home/xconnorwang/.local/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

run.py FAILED

Failures:

Root Cause (first observed failure): [0]: time : 2024-10-31_18:30:56 host : VM-143-114-tencentos rank : 0 (local_rank: 0) exitcode : 1 (pid: 1073205) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html + echo 'start_time: 2024-10-31T18:29:49' start_time: 2024-10-31T18:29:49

bytedance / HLLM

tensor shape match error #14