KimMeen / Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
https://arxiv.org/abs/2310.01728
Apache License 2.0
1.06k stars 184 forks source link

RuntimeError: expected scalar type Float but found BFloat16 #3

Closed gsamaras closed 5 months ago

gsamaras commented 5 months ago

I am trying to run the ETTm1 example, but despite a plethora of efforts, I keep getting:

[2024-02-07 17:07:11,875] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-02-07 17:07:12,281] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.13.1, git-hash=unknown, git-branch=unknown
[2024-02-07 17:07:12,282] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-02-07 17:07:12,282] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-02-07 17:07:12,293] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=172.19.2.2, master_port=29500
[2024-02-07 17:07:12,293] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-02-07 17:07:13,600] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-02-07 17:07:13,601] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-02-07 17:07:13,601] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-02-07 17:07:13,602] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = Adam
[2024-02-07 17:07:13,602] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=Adam type=<class 'torch.optim.adam.Adam'>
[2024-02-07 17:07:13,603] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2024-02-07 17:07:13,603] [INFO] [stage_1_and_2.py:143:__init__] Reduce bucket size 200000000
[2024-02-07 17:07:13,603] [INFO] [stage_1_and_2.py:144:__init__] Allgather bucket size 200000000
[2024-02-07 17:07:13,603] [INFO] [stage_1_and_2.py:145:__init__] CPU Offload: False
[2024-02-07 17:07:13,603] [INFO] [stage_1_and_2.py:146:__init__] Round robin gradient partitioning: False
[2024-02-07 17:07:13,759] [INFO] [utils.py:791:see_memory_usage] Before initializing optimizer states
[2024-02-07 17:07:13,760] [INFO] [utils.py:792:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2024-02-07 17:07:13,761] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory:  used = 2.27 GB, percent = 7.2%
[2024-02-07 17:07:13,980] [INFO] [utils.py:791:see_memory_usage] After initializing optimizer states
[2024-02-07 17:07:13,981] [INFO] [utils.py:792:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2024-02-07 17:07:13,981] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory:  used = 2.32 GB, percent = 7.4%
[2024-02-07 17:07:13,981] [INFO] [stage_1_and_2.py:533:__init__] optimizer state initialized
[2024-02-07 17:07:14,103] [INFO] [utils.py:791:see_memory_usage] After initializing ZeRO optimizer
[2024-02-07 17:07:14,104] [INFO] [utils.py:792:see_memory_usage] MA 0.0 GB         Max_MA 0.0 GB         CA 0.0 GB         Max_CA 0 GB 
[2024-02-07 17:07:14,105] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory:  used = 2.32 GB, percent = 7.4%
[2024-02-07 17:07:14,107] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = Adam
[2024-02-07 17:07:14,107] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-02-07 17:07:14,107] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-02-07 17:07:14,107] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[3.9999999999999996e-05], mom=[(0.95, 0.999)]
[2024-02-07 17:07:14,108] [INFO] [config.py:984:print] DeepSpeedEngine configuration:
[2024-02-07 17:07:14,108] [INFO] [config.py:988:print]   activation_checkpointing_config  {
    "partition_activations": false, 
    "contiguous_memory_optimization": false, 
    "cpu_checkpointing": false, 
    "number_checkpoints": null, 
    "synchronize_checkpoint_boundary": false, 
    "profile": false
}
[2024-02-07 17:07:14,108] [INFO] [config.py:988:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-02-07 17:07:14,108] [INFO] [config.py:988:print]   amp_enabled .................. False
[2024-02-07 17:07:14,108] [INFO] [config.py:988:print]   amp_params ................... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   autotuning_config ............ {
    "enabled": false, 
    "start_step": null, 
    "end_step": null, 
    "metric_path": null, 
    "arg_mappings": null, 
    "metric": "throughput", 
    "model_info": null, 
    "results_dir": "autotuning_results", 
    "exps_dir": "autotuning_exps", 
    "overwrite": true, 
    "fast": true, 
    "start_profile_step": 3, 
    "end_profile_step": 5, 
    "tuner_type": "gridsearch", 
    "tuner_early_stopping": 5, 
    "tuner_num_trials": 50, 
    "model_info_path": null, 
    "mp_size": 1, 
    "max_train_batch_size": null, 
    "min_train_batch_size": 1, 
    "max_train_micro_batch_size_per_gpu": 1.024000e+03, 
    "min_train_micro_batch_size_per_gpu": 1, 
    "num_tuning_micro_batch_sizes": 3
}
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   bfloat16_enabled ............. True
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   checkpoint_parallel_write_pipeline  False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   checkpoint_tag_validation_enabled  True
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   checkpoint_tag_validation_fail  False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7985856dae60>
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   communication_data_type ...... None
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   curriculum_enabled_legacy .... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   curriculum_params_legacy ..... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   data_efficiency_enabled ...... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   dataloader_drop_last ......... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   disable_allgather ............ False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   dump_state ................... False
[2024-02-07 17:07:14,109] [INFO] [config.py:988:print]   dynamic_loss_scale_args ...... None
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_enabled ........... False
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_gas_boundary_resolution  1
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_layer_name ........ bert.encoder.layer
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_layer_num ......... 0
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_max_iter .......... 100
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_stability ......... 1e-06
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_tol ............... 0.01
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   eigenvalue_verbose ........... False
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   elasticity_enabled ........... False
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   flops_profiler_config ........ {
    "enabled": false, 
    "recompute_fwd_factor": 0.0, 
    "profile_step": 1, 
    "module_depth": -1, 
    "top_modules": 1, 
    "detailed": true, 
    "output_file": null
}
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   fp16_auto_cast ............... None
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   fp16_enabled ................. False
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   fp16_master_weights_and_gradients  False
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   global_rank .................. 0
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   grad_accum_dtype ............. None
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   gradient_accumulation_steps .. 1
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   gradient_clipping ............ 0.0
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   gradient_predivide_factor .... 1.0
[2024-02-07 17:07:14,110] [INFO] [config.py:988:print]   graph_harvesting ............. False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   initial_dynamic_scale ........ 1
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   load_universal_checkpoint .... False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   loss_scale ................... 1.0
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   memory_breakdown ............. False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   mics_hierarchial_params_gather  False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   mics_shard_size .............. -1
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   nebula_config ................ {
    "enabled": false, 
    "persistent_storage_path": null, 
    "persistent_time_interval": 100, 
    "num_of_version_in_retention": 2, 
    "enable_nebula_load": true, 
    "load_path": null
}
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   optimizer_legacy_fusion ...... False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   optimizer_name ............... None
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   optimizer_params ............. None
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   pld_enabled .................. False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   pld_params ................... False
[2024-02-07 17:07:14,111] [INFO] [config.py:988:print]   prescale_gradients ........... False
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   scheduler_name ............... None
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   scheduler_params ............. None
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   seq_parallel_communication_data_type  torch.float32
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   sparse_attention ............. None
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   sparse_gradients_enabled ..... False
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   steps_per_print .............. inf
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   train_batch_size ............. 24
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   train_micro_batch_size_per_gpu  24
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   use_data_before_expert_parallel_  False
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   use_node_local_storage ....... False
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   wall_clock_breakdown ......... False
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   weight_quantization_config ... None
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   world_size ................... 1
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   zero_allow_untested_optimizer  True
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=200000000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=200000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   zero_enabled ................. True
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   zero_force_ds_cpu_optimizer .. True
[2024-02-07 17:07:14,112] [INFO] [config.py:988:print]   zero_optimization_stage ...... 2
[2024-02-07 17:07:14,113] [INFO] [config.py:974:print_user_config]   json = {
    "bf16": {
        "enabled": true, 
        "auto_cast": true
    }, 
    "zero_optimization": {
        "stage": 2, 
        "allgather_partitions": true, 
        "allgather_bucket_size": 2.000000e+08, 
        "overlap_comm": true, 
        "reduce_scatter": true, 
        "reduce_bucket_size": 2.000000e+08, 
        "contiguous_gradients": true, 
        "sub_group_size": 1.000000e+09
    }, 
    "gradient_accumulation_steps": 1, 
    "train_batch_size": 24, 
    "train_micro_batch_size_per_gpu": 24, 
    "steps_per_print": inf, 
    "wall_clock_breakdown": false, 
    "fp16": {
        "enabled": false
    }, 
    "zero_allow_untested_optimizer": true
}
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/kaggle/working/Time-LLM/run_main.py", line 208, in <module>
    outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1842, in forward
    loss = self.module(*inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/kaggle/working/Time-LLM/models/Autoformer.py", line 146, in forward
    dec_out = self.forecast(x_enc, x_mark_enc, x_dec, x_mark_dec)
  File "/kaggle/working/Time-LLM/models/Autoformer.py", line 102, in forecast
    enc_out = self.enc_embedding(x_enc, x_mark_enc)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/kaggle/working/Time-LLM/layers/Embed.py", line 145, in forward
    x = self.value_embedding(x) + self.temporal_embedding(x_mark)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/kaggle/working/Time-LLM/layers/Embed.py", line 42, in forward
    x = self.tokenConv(x.permute(0, 2, 1)).transpose(1, 2)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 310, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 303, in _conv_forward
    return F.conv1d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
RuntimeError: expected scalar type Float but found BFloat16
KimMeen commented 5 months ago

@gsamaras I suspect this was caused by the --mixed_precision bf16 and related configurations. The current implementation is natively on Ampere cards (or above) that support such tensor operations well.

gsamaras commented 5 months ago

If I try to execute with accelerate_launch, I get the same problem as in https://github.com/KimMeen/Time-LLM/issues/1#issuecomment-1925634459.

If I try to execute with python run_main.py I get the RuntimeError: expected scalar type Float but found BFloat16 error. The mixed precision parameter is only used in accelerate launch, when executing with python it's not recognized.

Do you have an online notebook I could use where you have a working instance of your model?

KimMeen commented 5 months ago

@gsamaras please use accelerate launch and make sure you have correctly configurated num_process as I mentioned in #1. You may also refer to https://github.com/paperswithcode/galai/issues/3 for the error of RuntimeError: CUDA error: invalid device ordinal. The default configuration and scripts should be ready to use on an instance with 8*A100.

gsamaras commented 5 months ago

I changed num_processes to the no of GPUs. Even if I try to run with fp16 (while changing it in deepseed config too), I get:

: mat1 and mat2 must have the same dtype, but got Float and HalfRuntimeError
: mat1 and mat2 must have the same dtype, but got Float and Half

Are you aware where I can find a free online instance with A100 for a basic demo of your code @KimMeen? Kaggle? Colab maybe?

KimMeen commented 5 months ago

@gsamaras The error you encountered may be caused by this: https://github.com/KimMeen/Time-LLM/blob/39d7d77c02d9c1c7c440d35faffeecb35fca7d8c/models/TimeLLM.py#L144

Slight modifications, like the one mentioned above, are needed if you are not using Ampere cards. You may refer to this for information on GPU instances.

aliper96 commented 5 months ago

`for ii in range(args.itr):

setting record of experiments

setting = '{}_{}_{}_{}_ft{}_sl{}_ll{}_pl{}_dm{}_nh{}_el{}_dl{}_df{}_fc{}_eb{}_{}_{}'.format(
    args.task_name,
    args.model_id,
    args.model,
    args.data,
    args.features,
    args.seq_len,
    args.label_len,
    args.pred_len,
    args.d_model,
    args.n_heads,
    args.e_layers,
    args.d_layers,
    args.d_ff,
    args.factor,
    args.embed,
    args.des, ii)

train_data, train_loader = data_provider(args, 'train')
vali_data, vali_loader = data_provider(args, 'val')
test_data, test_loader = data_provider(args, 'test')

if args.model == 'Autoformer':
    model = Autoformer.Model(args).float()
elif args.model == 'DLinear':
    model = DLinear.Model(args).float()
else:
    model = TimeLLM.Model(args).float()

model = model.to(torch.bfloat16)

path = os.path.join(args.checkpoints,
                    setting + '-' + args.model_comment)  # unique checkpoint saving path
args.content = load_content(args)
if not os.path.exists(path) and accelerator.is_local_main_process:
    os.makedirs(path)

time_now = time.time()

train_steps = len(train_loader)
early_stopping = EarlyStopping(accelerator=accelerator, patience=args.patience)

trained_parameters = []
for p in model.parameters():
    if p.requires_grad is True:
        trained_parameters.append(p)

model_optim = optim.Adam(trained_parameters, lr=args.learning_rate)

if args.lradj == 'COS':
    scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(model_optim, T_max=20, eta_min=1e-8)
else:
    scheduler = lr_scheduler.OneCycleLR(optimizer=model_optim,
                                        steps_per_epoch=train_steps,
                                        pct_start=args.pct_start,
                                        epochs=args.train_epochs,
                                        max_lr=args.learning_rate)

criterion = nn.MSELoss()
mae_metric = nn.L1Loss()

train_loader, vali_loader, test_loader, model, model_optim, scheduler = accelerator.prepare(
    train_loader, vali_loader, test_loader, model, model_optim, scheduler)

if args.use_amp:
    scaler = torch.cuda.amp.GradScaler()

for epoch in range(args.train_epochs):
    iter_count = 0
    train_loss = []

    model.train()
    epoch_time = time.time()
    for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in tqdm(enumerate(train_loader)):
        iter_count += 1
        model_optim.zero_grad()

        batch_x = batch_x.float().to(accelerator.device)
        batch_y = batch_y.float().to(accelerator.device)
        batch_x_mark = batch_x_mark.float().to(accelerator.device)
        batch_y_mark = batch_y_mark.float().to(accelerator.device)

        # decoder input
        dec_inp = torch.zeros_like(batch_y[:, -args.pred_len:, :]).float().to(
            accelerator.device)
        dec_inp = torch.cat([batch_y[:, :args.label_len, :], dec_inp], dim=1).float().to(
            accelerator.device)

        # encoder - decoder
        if args.use_amp:
            with torch.cuda.amp.autocast():
                if args.output_attention:
                    outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
                else:
                    outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)

                f_dim = -1 if args.features == 'MS' else 0
                outputs = outputs[:, -args.pred_len:, f_dim:]
                batch_y = batch_y[:, -args.pred_len:, f_dim:].to(accelerator.device)
                loss = criterion(outputs, batch_y)
                train_loss.append(loss.item())
        else:
            if args.output_attention:
                outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)[0]
            else:
                outputs = model(batch_x, batch_x_mark, dec_inp, batch_y_mark)

            f_dim = -1 if args.features == 'MS' else 0
            outputs = outputs[:, -args.pred_len:, f_dim:]
            batch_y = batch_y[:, -args.pred_len:, f_dim:]
            loss = criterion(outputs, batch_y)
            train_loss.append(loss.item())

        if (i + 1) % 100 == 0:
            accelerator.print(
                "\titers: {0}, epoch: {1} | loss: {2:.7f}".format(i + 1, epoch + 1, loss.item()))
            speed = (time.time() - time_now) / iter_count
            left_time = speed * ((args.train_epochs - epoch) * train_steps - i)
            accelerator.print('\tspeed: {:.4f}s/iter; left time: {:.4f}s'.format(speed, left_time))
            iter_count = 0
            time_now = time.time()

        if args.use_amp:
            scaler.scale(loss).backward()
            scaler.step(model_optim)
            scaler.update()
        else:
            accelerator.backward(loss)
            model_optim.step()

        if args.lradj == 'TST':
            adjust_learning_rate(accelerator, model_optim, scheduler, epoch + 1, args, printout=False)
            scheduler.step()

    accelerator.print("Epoch: {} cost time: {}".format(epoch + 1, time.time() - epoch_time))
    train_loss = np.average(train_loss)
    vali_loss, vali_mae_loss = vali(args, accelerator, model, vali_data, vali_loader, criterion, mae_metric)
    test_loss, test_mae_loss = vali(args, accelerator, model, test_data, test_loader, criterion, mae_metric)
    accelerator.print(
        "Epoch: {0} | Train Loss: {1:.7f} Vali Loss: {2:.7f} Test Loss: {3:.7f} MAE Loss: {4:.7f}".format(
            epoch + 1, train_loss, vali_loss, test_loss, test_mae_loss))

    early_stopping(vali_loss, model, path)
    if early_stopping.early_stop:
        accelerator.print("Early stopping")
        break

    if args.lradj != 'TST':
        if args.lradj == 'COS':
            scheduler.step()
            accelerator.print("lr = {:.10f}".format(model_optim.param_groups[0]['lr']))
        else:
            if epoch == 0:
                args.learning_rate = model_optim.param_groups[0]['lr']
                accelerator.print("lr = {:.10f}".format(model_optim.param_groups[0]['lr']))
            adjust_learning_rate(accelerator, model_optim, scheduler, epoch + 1, args, printout=True)

    else:
        accelerator.print('Updating learning rate to {}'.format(scheduler.get_last_lr()[0]))

accelerator.wait_for_everyone() ` this worked for me, also in windows

aliper96 commented 5 months ago

ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True) accelerator = Accelerator(kwargs_handlers=[ddp_kwargs]) also had to remove the deepspeed dependency like this the only change in the previous code is tha i converted the entire model to 16 bits with this line : model = model.to(torch.bfloat16)

gsamaras commented 5 months ago

@aliper96 thanks for joining in, can you provide a minimal complete and reproducible example please?

aliper96 commented 5 months ago

alitimellm.zip here you have the notebook

gsamaras commented 5 months ago

@aliper96 I think I'm close, but I get the following error, something with the paths maybe? Check it live in Kaggle here:

---------------------------------------------------------------------------
HFValidationError                         Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:385, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    383 try:
    384     # Load from URL or cache if already cached
--> 385     resolved_file = hf_hub_download(
    386         path_or_repo_id,
    387         filename,
    388         subfolder=None if len(subfolder) == 0 else subfolder,
    389         repo_type=repo_type,
    390         revision=revision,
    391         cache_dir=cache_dir,
    392         user_agent=user_agent,
    393         force_download=force_download,
    394         proxies=proxies,
    395         resume_download=resume_download,
    396         token=token,
    397         local_files_only=local_files_only,
    398     )
    399 except GatedRepoError as e:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:110, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    109 if arg_name in ["repo_id", "from_id", "to_id"]:
--> 110     validate_repo_id(arg_value)
    112 elif arg_name == "token" and arg_value is not None:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:158, in validate_repo_id(repo_id)
    157 if repo_id.count("/") > 1:
--> 158     raise HFValidationError(
    159         "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
    160         f" '{repo_id}'. Use `repo_type` argument if needed."
    161     )
    163 if not REPO_ID_REGEX.match(repo_id):

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
Cell In[22], line 30
     28     model = DLinear.Model(args).float()
     29 else:
---> 30     model = TimeLLM.Model(args).float()
     32 model = model.to(torch.bfloat16)
     35 path = os.path.join(args.checkpoints,
     36                     setting + '-' + args.model_comment)  # unique checkpoint saving path

File /kaggle/working/Time-LLM/models/TimeLLM.py:44, in Model.__init__(self, configs, patch_len, stride)
     41 self.patch_len = configs.patch_len
     42 self.stride = configs.stride
---> 44 self.llama_config = LlamaConfig.from_pretrained('/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/')
     45 # self.llama_config = LlamaConfig.from_pretrained('huggyllama/llama-7b')
     46 self.llama_config.num_hidden_layers = configs.llm_layers

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:605, in PretrainedConfig.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, **kwargs)
    601 kwargs["revision"] = revision
    603 cls._set_token_in_kwargs(kwargs, token)
--> 605 config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
    606 if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
    607     logger.warning(
    608         f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
    609         f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
    610     )

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:634, in PretrainedConfig.get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    632 original_kwargs = copy.deepcopy(kwargs)
    633 # Get config dict associated with the base config file
--> 634 config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
    635 if "_commit_hash" in config_dict:
    636     original_kwargs["_commit_hash"] = config_dict["_commit_hash"]

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:689, in PretrainedConfig._get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    685 configuration_file = kwargs.pop("_configuration_file", CONFIG_NAME)
    687 try:
    688     # Load from local folder or from cache or download from model Hub and cache
--> 689     resolved_config_file = cached_file(
    690         pretrained_model_name_or_path,
    691         configuration_file,
    692         cache_dir=cache_dir,
    693         force_download=force_download,
    694         proxies=proxies,
    695         resume_download=resume_download,
    696         local_files_only=local_files_only,
    697         token=token,
    698         user_agent=user_agent,
    699         revision=revision,
    700         subfolder=subfolder,
    701         _commit_hash=commit_hash,
    702     )
    703     commit_hash = extract_commit_hash(resolved_config_file, commit_hash)
    704 except EnvironmentError:
    705     # Raise any environment error raise by `cached_file`. It will have a helpful error message adapted to
    706     # the original exception.

File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:450, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    448     raise EnvironmentError(f"There was a specific connection error when trying to load {path_or_repo_id}:\n{err}")
    449 except HFValidationError as e:
--> 450     raise EnvironmentError(
    451         f"Incorrect path_or_model_id: '{path_or_repo_id}'. Please provide either the path to a local folder or the repo_id of a model on the Hub."
    452     ) from e
    453 return resolved_file

OSError: Incorrect path_or_model_id: '/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
KimMeen commented 5 months ago

@aliper96 I think I'm close, but I get the following error, something with the paths maybe? Check it live in Kaggle here:

---------------------------------------------------------------------------
HFValidationError                         Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:385, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    383 try:
    384     # Load from URL or cache if already cached
--> 385     resolved_file = hf_hub_download(
    386         path_or_repo_id,
    387         filename,
    388         subfolder=None if len(subfolder) == 0 else subfolder,
    389         repo_type=repo_type,
    390         revision=revision,
    391         cache_dir=cache_dir,
    392         user_agent=user_agent,
    393         force_download=force_download,
    394         proxies=proxies,
    395         resume_download=resume_download,
    396         token=token,
    397         local_files_only=local_files_only,
    398     )
    399 except GatedRepoError as e:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:110, in validate_hf_hub_args.<locals>._inner_fn(*args, **kwargs)
    109 if arg_name in ["repo_id", "from_id", "to_id"]:
--> 110     validate_repo_id(arg_value)
    112 elif arg_name == "token" and arg_value is not None:

File /opt/conda/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py:158, in validate_repo_id(repo_id)
    157 if repo_id.count("/") > 1:
--> 158     raise HFValidationError(
    159         "Repo id must be in the form 'repo_name' or 'namespace/repo_name':"
    160         f" '{repo_id}'. Use `repo_type` argument if needed."
    161     )
    163 if not REPO_ID_REGEX.match(repo_id):

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/'. Use `repo_type` argument if needed.

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
Cell In[22], line 30
     28     model = DLinear.Model(args).float()
     29 else:
---> 30     model = TimeLLM.Model(args).float()
     32 model = model.to(torch.bfloat16)
     35 path = os.path.join(args.checkpoints,
     36                     setting + '-' + args.model_comment)  # unique checkpoint saving path

File /kaggle/working/Time-LLM/models/TimeLLM.py:44, in Model.__init__(self, configs, patch_len, stride)
     41 self.patch_len = configs.patch_len
     42 self.stride = configs.stride
---> 44 self.llama_config = LlamaConfig.from_pretrained('/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/')
     45 # self.llama_config = LlamaConfig.from_pretrained('huggyllama/llama-7b')
     46 self.llama_config.num_hidden_layers = configs.llm_layers

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:605, in PretrainedConfig.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, **kwargs)
    601 kwargs["revision"] = revision
    603 cls._set_token_in_kwargs(kwargs, token)
--> 605 config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
    606 if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
    607     logger.warning(
    608         f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
    609         f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
    610     )

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:634, in PretrainedConfig.get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    632 original_kwargs = copy.deepcopy(kwargs)
    633 # Get config dict associated with the base config file
--> 634 config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
    635 if "_commit_hash" in config_dict:
    636     original_kwargs["_commit_hash"] = config_dict["_commit_hash"]

File /opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py:689, in PretrainedConfig._get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    685 configuration_file = kwargs.pop("_configuration_file", CONFIG_NAME)
    687 try:
    688     # Load from local folder or from cache or download from model Hub and cache
--> 689     resolved_config_file = cached_file(
    690         pretrained_model_name_or_path,
    691         configuration_file,
    692         cache_dir=cache_dir,
    693         force_download=force_download,
    694         proxies=proxies,
    695         resume_download=resume_download,
    696         local_files_only=local_files_only,
    697         token=token,
    698         user_agent=user_agent,
    699         revision=revision,
    700         subfolder=subfolder,
    701         _commit_hash=commit_hash,
    702     )
    703     commit_hash = extract_commit_hash(resolved_config_file, commit_hash)
    704 except EnvironmentError:
    705     # Raise any environment error raise by `cached_file`. It will have a helpful error message adapted to
    706     # the original exception.

File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:450, in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
    448     raise EnvironmentError(f"There was a specific connection error when trying to load {path_or_repo_id}:\n{err}")
    449 except HFValidationError as e:
--> 450     raise EnvironmentError(
    451         f"Incorrect path_or_model_id: '{path_or_repo_id}'. Please provide either the path to a local folder or the repo_id of a model on the Hub."
    452     ) from e
    453 return resolved_file

OSError: Incorrect path_or_model_id: '/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

@gsamaras Simply use 'huggyllama/llama-7b' instead of '/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/' in TimeLLM.py will solve this issue.

aliper96 commented 5 months ago

@KimMeen I'm PhD student in generative AI and thanks for the code... most clean and understandle code!!

gsamaras commented 5 months ago

@KimMeen unfortunately when changing this, I this error:

ImportError: Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or `pip install bitsandbytes`.

although I installed both packages.

KimMeen commented 5 months ago

@gsamaras Remove load_in_8bit=True and have a try

gsamaras commented 5 months ago

@KimMeen it seems I don't have control there, it's not in your code, see the full error please:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[32], line 30
     28     model = DLinear.Model(args).float()
     29 else:
---> 30     model = Model(args).float() #TimeLLM.Model(args).float()
     32 model = model.to(torch.bfloat16)
     35 path = os.path.join(args.checkpoints,
     36                     setting + '-' + args.model_comment)  # unique checkpoint saving path

Cell In[12], line 55, in Model.__init__(self, configs, patch_len, stride)
     53 self.llama_config.output_attentions = True
     54 self.llama_config.output_hidden_states = True
---> 55 self.llama = LlamaModel.from_pretrained(
     56     #"/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/",
     57     'huggyllama/llama-7b',
     58     trust_remote_code=True,
     59     local_files_only=True,
     60     config=self.llama_config,
     61     load_in_4bit=True
     62 )
     64 self.tokenizer = LlamaTokenizer.from_pretrained(
     65     #"/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/tokenizer.model",
     66     'huggyllama/llama-7b',
     67     trust_remote_code=True,
     68     local_files_only=True
     69 )
     71 if self.tokenizer.eos_token:

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3034, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3032     raise RuntimeError("No GPU found. A GPU is needed for quantization.")
   3033 if not (is_accelerate_available() and is_bitsandbytes_available()):
-> 3034     raise ImportError(
   3035         "Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` and the latest version of"
   3036         " bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or"
   3037         " `pip install bitsandbytes`."
   3038     )
   3040 if torch_dtype is None:
   3041     # We force the `dtype` to be float16, this is a requirement from `bitsandbytes`
   3042     logger.info(
   3043         f"Overriding torch_dtype={torch_dtype} with `torch_dtype=torch.float16` due to "
   3044         "requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. "
   3045         "Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass"
   3046         " torch_dtype=torch.float16 to remove this warning."
   3047     )

ImportError: Using `load_in_8bit=True` requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes `pip install -i https://test.pypi.org/simple/ bitsandbytes` or `pip install bitsandbytes`.
KimMeen commented 5 months ago

@gsamaras Will removing load_in_4bit=True works on your end? See also https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g/discussions/11

gsamaras commented 5 months ago

No, sorry:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[45], line 30
     28     model = DLinear.Model(args).float()
     29 else:
---> 30     model = Model(args).float() #TimeLLM.Model(args).float()
     32 model = model.to(torch.bfloat16)
     35 path = os.path.join(args.checkpoints,
     36                     setting + '-' + args.model_comment)  # unique checkpoint saving path

Cell In[42], line 54, in Model.__init__(self, configs, patch_len, stride)
     52 self.llama_config.output_attentions = True
     53 self.llama_config.output_hidden_states = True
---> 54 self.llama = LlamaModel.from_pretrained(
     55     #"/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/",
     56     'huggyllama/llama-7b',
     57     trust_remote_code=True,
     58     local_files_only=True,
     59     config=self.llama_config,
     60     load_in_4bit=False
     61 )
     63 self.tokenizer = LlamaTokenizer.from_pretrained(
     64     #"/mnt/alps/modelhub/pretrained_model/LLaMA/7B_hf/tokenizer.model",
     65     'huggyllama/llama-7b',
     66     trust_remote_code=True,
     67     local_files_only=True
     68 )
     70 if self.tokenizer.eos_token:

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:3455, in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3452         return key.replace("gamma", "weight")
   3453     return key
-> 3455 original_loaded_keys = loaded_keys
   3456 loaded_keys = [_fix_key(key) for key in loaded_keys]
   3458 if len(prefix) > 0:

OSError: huggyllama/llama-7b does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack