Error: AttributeError: 'numpy.ndarray' object has no attribute 'to'
log:
Traceback (most recent call last):
File "/home/yaning/CodeFolder/FedScaleOrigin/fedscale/cloud/aggregation/aggregator.py", line 899, in <module>
aggregator.run()
File "/home/yaning/CodeFolder/FedScaleOrigin/fedscale/cloud/aggregation/aggregator.py", line 370, in run
self.event_monitor()
File "/home/yaning/CodeFolder/FedScaleOrigin/fedscale/cloud/aggregation/aggregator.py", line 873, in event_monitor
self.deserialize_response(data))
File "/home/yaning/CodeFolder/FedScaleOrigin/fedscale/cloud/aggregation/aggregator.py", line 425, in client_completion_handler
self.update_weight_aggregation(results)
File "/home/yaning/CodeFolder/FedScaleOrigin/fedscale/cloud/aggregation/aggregator.py", line 443, in update_weight_aggregation
self.model_wrapper.set_weights(copy.deepcopy(self.model_weights))
File "/data/home/yaning/CodeFolder/FedScaleOrigin/fedscale/cloud/internal/torch_model_adapter.py", line 35, in set_weights
self.optimizer.update_round_gradient(weights, current_grad_weights, self.model)
File "/data/home/yaning/CodeFolder/FedScaleOrigin/fedscale/cloud/aggregation/optimizers.py", line 37, in update_round_gradient
last_model = [x.to(device=self.device) for x in last_model]
File "/data/home/yaning/CodeFolder/FedScaleOrigin/fedscale/cloud/aggregation/optimizers.py", line 37, in <listcomp>
last_model = [x.to(device=self.device) for x in last_model]
AttributeError: 'numpy.ndarray' object has no attribute 'to'
analyse
In fedscale/cloud/internal/torch_model_adapter.py", line 35, in set_weights, weights is a list of np.ndarray. It calls the function optimizer.update_round_gradient, which is in fedscale/cloud/aggregation/optimizers.py, line 37, and the code last_model = [x.to(device=self.device) for x in last_model] report an error because x is a np.ndarray, not a torch.Tensor(), it dosn't have method to.
# Configuration file of FAR training experiment
# ========== Cluster configuration ==========
# ip address of the parameter server (need 1 GPU process)
ps_ip: 10.128.201.124
# ip address of each worker:# of available gpus process on each gpu in this node
# Note that if we collocate ps and worker on same GPU, then we need to decrease this number of available processes on that GPU by 1
# E.g., master node has 4 available processes, then 1 for the ps, and worker should be set to: worker:3
worker_ips:
# - 10.128.201.129:[5,5] # worker_ip: [(# processes on gpu) for gpu in available_gpus]
- 10.128.201.124:[3,3]
exp_path: $FEDSCALE_HOME/fedscale/cloud
# Entry function of executor and aggregator under $exp_path
executor_entry: execution/executor.py
aggregator_entry: aggregation/aggregator.py
auth:
ssh_user: "yaning"
ssh_private_key: ~/.ssh/id_rsa
# cmd to run before we can indeed run FAR (in order)
setup_commands:
- source $HOME/anaconda3/bin/activate fedscaleorigin
- export NCCL_SOCKET_IFNAME='enp94s0f0' # Run "ifconfig" to ensure the right NIC for nccl if you have multiple NICs
# ========== Additional job configuration ==========
# Default parameters are specified in config_parser.py, wherein more description of the parameter can be found
job_conf:
- job_name: google_speech # Generate logs under this folder: log_path/job_name/time_stamp
- use_cuda: True
- log_path: $FEDSCALE_HOME/benchmark # Path of log files
- task: speech
- num_participants: 50 # Number of participants per round, we use K=100 in our paper, large K will be much slower
- data_set: google_speech # Dataset: openImg, google_speech, stackoverflow
- data_dir: $FEDSCALE_HOME/benchmark/dataset/data/google_speech # Path of the dataset
- data_map_file: $FEDSCALE_HOME/benchmark/dataset/data/google_speech/client_data_mapping/train.csv # Allocation of data to each client, turn to iid setting if not provided
- device_conf_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_device_capacity # Path of the client trace
- device_avail_file: $FEDSCALE_HOME/benchmark/dataset/data/device_info/client_behave_trace
- model: resnet34 # Models: e.g., shufflenet_v2_x2_0, mobilenet_v2, resnet34, albert-base-v2
- gradient_policy: fed-yogi # {"fed-yogi", "fed-prox", "fed-avg"}, "fed-avg" by default
- eval_interval: 10 # How many rounds to run a testing on the testing set
- rounds: 1000 # Number of rounds to run this training. We use 1000 in our paper, while it may converge w/ ~400 rounds
- filter_less: 21 # Remove clients w/ less than 21 samples
- num_loaders: 4
- yogi_eta: 3e-3
- yogi_tau: 1e-8
- local_steps: 30
- learning_rate: 0.05
- batch_size: 16
- test_bsz: 20
- sample_mode: oort
- save_checkpoint: False
What happened + What you expected to happen
Error: AttributeError: 'numpy.ndarray' object has no attribute 'to'
log:
In
fedscale/cloud/internal/torch_model_adapter.py", line 35, in set_weights
,weights
is a list of np.ndarray. It calls the functionoptimizer.update_round_gradient
, which is infedscale/cloud/aggregation/optimizers.py, line 37
, and the codelast_model = [x.to(device=self.device) for x in last_model]
report an error becausex
is a np.ndarray, not a torch.Tensor(), it dosn't have methodto
.Versions / Dependencies
fedscale==0.5
python==3.7.16
os:
Reproduction script
Issue Severity
High: It blocks me from completing my task.