Is there someone running accelerate launch with FSDP successfully? Please share the accelerate config, thx.
I run into the following error:
RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory.
I am running eval on 4 GPUs, the error message is replicate on both GPUs. Typically one batch is completed on one of the GPUs before erroring out.
Is there someone running accelerate launch with FSDP successfully? Please share the accelerate config, thx.
I run into the following error:
RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory.
I am running eval on 4 GPUs, the error message is replicate on both GPUs. Typically one batch is completed on one of the GPUs before erroring out.
Accelerate conf: ''' compute_environment: LOCAL_MACHINE debug: false distributed_type: FSDP downcast_bf16: 'no' fsdp_config: fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP fsdp_backward_prefetch_policy: BACKWARD_PRE fsdp_cpu_ram_efficient_loading: true fsdp_forward_prefetch: true fsdp_offload_params: false fsdp_sharding_strategy: 1 fsdp_state_dict_type: SHARDED_STATE_DICT fsdp_sync_module_states: true fsdp_use_orig_params: true machine_rank: 0 main_training_function: main mixed_precision: fp16 num_machines: 1 num_processes: 4 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false '''