nawfalCode commented 1 year ago

Hi @burlachenkok, I installed the project successfully but ended up with this runtime error "expected scalar type Long but found Double" that occurred with different configurations. Another error I found while trying different datasets "Using a target size (torch.Size([500])) that is different to the input size (torch.Size([500, 10])) is deprecated. Please ensure they have the same size." I would greatly appreciate any help.

Here is the log of the project:

Job '{simcounter}_jobid{now}' with algorithm 'FEDAVG' has been sumbitted

Command line for experiment with job_id=1_job_id_1679617053

python run.py \ --rounds "3000" \ --client-sampling-type "uniform" \ --num-clients-per-round "10" \ --global-lr "0.1" \ --global-optimiser "sgd" \ --global-weight-decay "0.0" \ --number-of-local-iters "1" \ --batch-size "500" \ --local-lr "0.01" \ --local-optimiser "sgd" \ --local-weight-decay "0.0" \ --dataset "cifar10_fl" \ --loss "crossentropy" \ --model "tv_resnet18" \ --use-pretrained \ --train-last-layer \ --metric "top_1_acc" \ --global-regulizer "none" \ --global-regulizer-alpha "0.0" \ --checkpoint-dir "../check_points" \ --do-not-save-eval-checkpoints \ --data-path "../data/" \ --compute-type "fp64" \ --gpu "-1" \ --log-gpu-usage \ --num-workers-train "0" \ --num-workers-test "0" \ --deterministic \ --manual-init-seed "123" \ --manual-runtime-seed "456" \ --group-name "" \ --comment "" \ --hostname "nfl" \ --eval-every "100" \ --eval-async-threads "0" \ --save-async-threads "0" \ --threadpool-for-local-opt "0" \ --run-id "1_job_id_1679617053" \ --algorithm "fedavg" \ --algorithm-options "internal_sgd:full-gradient" \ --logfile "../logs/1_log_1679617053.txt" \ --client-compressor "ident:5%" \ --extra-track "full_gradient_norm_train,full_objective_value_train" \ --allow-use-nv-tensorcores \ --initialize-shifts-policy "zero" \ --wandb-key "" \ --wandb-project-name "fl_pytorch_simulation" \ --loglevel "debug" \ --logfilter ".*" \ --out "1_job_id_1679617053.bin"

===========================================================

2023-03-24 11:19:48.269860

===========================================================

Command line for currently selected configuration in GUI

python run.py \ --rounds "3000" \ --client-sampling-type "uniform" \ --num-clients-per-round "10" \ --global-lr "0.1" \ --global-optimiser "SGD" \ --global-weight-decay "0.0" \ --number-of-local-iters "1" \ --batch-size "500" \ --local-lr "0.01" \ --local-optimiser "SGD" \ --local-weight-decay "0.0" \ --dataset "cifar10_fl" \ --loss "CROSSENTROPY" \ --model "tv_resnet18" \ --use-pretrained \ --train-last-layer \ --metric "top_1_acc" \ --global-regulizer "none" \ --global-regulizer-alpha "0.0" \ --checkpoint-dir "../check_points" \ --do-not-save-eval-checkpoints \ --data-path "../data/" \ --compute-type "fp64" \ --gpu "-1" \ --log-gpu-usage \ --num-workers-train "0" \ --num-workers-test "0" \ --deterministic \ --manual-init-seed "123" \ --manual-runtime-seed "456" \ --group-name "" \ --comment "" \ --hostname "nfl" \ --eval-every "100" \ --eval-async-threads "0" \ --save-async-threads "0" \ --threadpool-for-local-opt "0" \ --run-id "{simcounter}_jobid{now}" \ --algorithm "FEDAVG" \ --algorithm-options "internal_sgd:full-gradient" \ --logfile "../logs/{simcounter}log{now}.txt" \ --client-compressor "ident:5%" \ --extra-track "full_gradient_norm_train,full_objective_value_train" \ --allow-use-nv-tensorcores \ --initialize-shifts-policy "zero" \ --wandb-key "" \ --wandb-project-name "fl_pytorch_simulation" \ --loglevel "DEBUG" \ --logfilter ".*" \ --out "current.bin"

===========================================================

Release unoccupied cache memory from PyTorch... Running the garbage collector... Done. 0.01 MB was removed from Virtual and Resident memory of interpreter. Current used amount of memory is 38366.88 MBytes PyTorch version: 1.10.0 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 13.2.1 (x86_64) GCC version: Could not collect Clang version: 14.0.0 (clang-1400.0.29.202) CMake version: Could not collect Libc version: N/A

Python version: 3.9.2 (v3.9.2:1a79785e3e, Feb 19 2021, 09:06:10) [Clang 6.0 (clang-600.0.57)] (64-bit runtime) Python platform: macOS-10.16-x86_64-i386-64bit Is CUDA available: False CUDA runtime version: No CUDA GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries: [pip3] numpy==1.24.2 [pip3] torch==1.10.0 [pip3] torchaudio==0.10.0 [pip3] torchvision==0.11.0 [conda] Could not collect

burlachenkok commented 1 year ago

Thanks for reporting. I think recently the project has been updated to pytorch 1.10.0. I will check what the problem is.

burlachenkok commented 1 year ago

There are unittests in the project, but they are covering only some functionality. p.s. Regressions tests for checking that project is launchable have not been created.

burlachenkok commented 1 year ago

Fixed. @nawfalCode please in case if the problem will still be here reopen the ticket. Thanks.

burlachenkok / flpytorch

Runtime Error: expected scalar type Long but found Double #12

Command line for experiment with job_id=1_job_id_1679617053

===========================================================

===========================================================

Command line for currently selected configuration in GUI

===========================================================