google-research / federated

A collection of Google research projects related to Federated Learning and Federated Analytics.
Apache License 2.0
684 stars 196 forks source link

[distributed-dp] running of fl_run: AttributeError: module 'tensorflow_federated.python.program' has no attribute 'TensorBoardReleaseManager' #58

Open adam-dziedzic opened 2 years ago

adam-dziedzic commented 2 years ago

Hi All,

Thank you for publishing the code.

I installed the packages in the same way as recommended here: https://github.com/google-research/federated/issues/57#issuecomment-1062468566

(tff) ady@vws35:~/code/federated/distributed_dp$ pip list
Package                       Version
----------------------------- -------------------
absl-py                       1.0.0
astunparse                    1.6.3
attrs                         21.2.0
cachetools                    3.1.1
certifi                       2021.10.8
charset-normalizer            2.0.12
cloudpickle                   2.0.0
cycler                        0.11.0
decorator                     5.1.1
dill                          0.3.4
dm-tree                       0.1.6
farmhashpy                    0.4.0
flatbuffers                   2.0
fonttools                     4.30.0
gast                          0.5.3
google-auth                   2.6.0
google-auth-oauthlib          0.4.6
google-pasta                  0.2.0
googleapis-common-protos      1.55.0
grpcio                        1.34.1
h5py                          3.6.0
idna                          3.3
importlib-metadata            4.11.3
jax                           0.2.28
jaxlib                        0.1.76
keras                         2.8.0
Keras-Preprocessing           1.1.2
kiwisolver                    1.4.0
libclang                      13.0.0
Markdown                      3.3.6
matplotlib                    3.5.1
mpmath                        1.2.1
numpy                         1.21.5
oauthlib                      3.2.0
opt-einsum                    3.3.0
packaging                     21.3
pandas                        1.4.1
Pillow                        9.0.1
pip                           21.2.4
portpicker                    1.3.9
promise                       2.3
protobuf                      3.19.4
pyasn1                        0.4.8
pyasn1-modules                0.2.8
pyparsing                     3.0.7
python-dateutil               2.8.2
pytz                          2021.3
requests                      2.27.1
requests-oauthlib             1.3.1
rsa                           4.8
scipy                         1.8.0
semantic-version              2.8.5
setuptools                    58.0.4
six                           1.16.0
tensorboard                   2.8.0
tensorboard-data-server       0.6.1
tensorboard-plugin-wit        1.8.1
tensorflow                    2.8.0
tensorflow-addons             0.16.1
tensorflow-datasets           4.5.2
tensorflow-estimator          2.8.0
tensorflow-federated          0.20.0
tensorflow-io-gcs-filesystem  0.24.0
tensorflow-metadata           1.7.0
tensorflow-model-optimization 0.7.1
tensorflow-privacy            0.7.3
tensorflow-probability        0.16.0
termcolor                     1.1.0
tf-estimator-nightly          2.8.0.dev2021122109
tqdm                          4.28.1
typeguard                     2.13.3
typing_extensions             4.1.1
urllib3                       1.26.9
Werkzeug                      2.0.3
wheel                         0.37.1
wrapt                         1.14.0
zipp                          3.7.0
(tff) ady@vws35:~/code/federated/distributed_dp$ python --version
Python 3.9.7

However, I run in the following problem:

(tff) ady@vws35:~/code/federated/distributed_dp$ bazel run :fl_run --     --task=emnist_character     --server_optimizer=sgd     --server_learning_rate=1     --server_sgd_momentum=0.9     --client_optimizer=sgd     --client_learning_rate=0.03     --client_batch_size=20     --experiment_name=my_emnist_test     --epsilon=10     --l2_norm_clip=0.03     --dp_mechanism=ddgauss     --logtostderr  --total_rounds 2
WARNING: Output base '/h/ady/.cache/bazel/_bazel_ady/39df1af3e8de7748262d01b9bcee607d' is on NFS. This may lead to surprising failures and undetermined behavior.
DEBUG: Rule 'rules_python' indicated that a canonical reproducible form can be obtained by modifying arguments commit = "a0fbf98d4e3a232144df4d0d80b577c7a693b570", shallow_since = "1586444447 +0200" and dropping ["tag"]
DEBUG: Repository rules_python instantiated at:
  /h/ady/code/federated/WORKSPACE:5:15: in <toplevel>
Repository rule git_repository defined at:
  /h/ady/.cache/bazel/_bazel_ady/39df1af3e8de7748262d01b9bcee607d/external/bazel_tools/tools/build_defs/repo/git.bzl:199:33: in <toplevel>
INFO: Analyzed target //distributed_dp:fl_run (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //distributed_dp:fl_run up-to-date:
  bazel-bin/distributed_dp/fl_run
INFO: Elapsed time: 0.396s, Critical Path: 0.02s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/distributed_dp/fl_run '--task=emnist_character' '--server_optimizer=sgd' '--server_learning_rate=1' '--server_sgd_momentum=0.9' '--client_optimizer=sgd' '--client_learning_rate=0.03' '--client_batch_size=20' '--experiment_name=my_emnist_test' '--epsilon=10' '--l2_norm_clip=0.03' '--dp_mechanism=ddgauss'INFO: Build completed successfully, 1 total action
2022-03-17 16:06:23.299187: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-17 16:06:23.299213: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-03-17 16:06:30.611416: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-17 16:06:30.612106: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2022-03-17 16:06:30.612540: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2022-03-17 16:06:30.613066: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory
2022-03-17 16:06:30.613540: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcurand.so.10'; dlerror: libcurand.so.10: cannot open shared object file: No such file or directory
2022-03-17 16:06:30.613981: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusolver.so.11'; dlerror: libcusolver.so.11: cannot open shared object file: No such file or directory
2022-03-17 16:06:30.614415: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2022-03-17 16:06:30.614841: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2022-03-17 16:06:30.614853: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1850] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
I0317 16:06:30.714181 140367148568768 sql_client_data.py:127] Loaded 3400 client ids from SQL database.
2022-03-17 16:06:30.717065: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
I0317 16:06:30.906410 140367148568768 sql_client_data.py:127] Loaded 3400 client ids from SQL database.
I0317 16:06:31.114517 140367148568768 keras_utils.py:365] Adding default num_examples metric to model
I0317 16:06:31.114643 140367148568768 keras_utils.py:368] Adding default num_batches metric to model
I0317 16:06:31.190037 140367148568768 keras_utils.py:365] Adding default num_examples metric to model
I0317 16:06:31.190158 140367148568768 keras_utils.py:368] Adding default num_batches metric to model
I0317 16:06:31.195640 140367148568768 fl_utils.py:71] Shared DP Parameters:
I0317 16:06:31.195845 140367148568768 fl_utils.py:72] {'clip': 0.03,
 'delta': 0.0002941176470588235,
 'dim': 1018174,
 'epsilon': 10.0,
 'mechanism': 'ddgauss',
 'num_clients': 3400,
 'num_clients_per_round': 100,
 'num_rounds': 2,
 'sampling_rate': 0.029411764705882353}
I0317 16:09:09.068672 140367148568768 fl_utils.py:151] ddgauss parameters:
I0317 16:09:09.068935 140367148568768 fl_utils.py:152] {'beta': 0.6065306597126334,
 'bits': 16,
 'dim': 1018174,
 'gamma': 0.0001248800740568264,
 'inflated_l2': 0.0707097967941405,
 'k_stddevs': 4,
 'local_stddev': 0.002430822051759469,
 'mechanism': 'ddgauss',
 'noise_mult_clip': 0.8102740172531564,
 'noise_mult_inflated': 0.3437744360709157,
 'padded_dim': 1048576.0,
 'scale': 8007.682631137392}
I0317 16:09:09.069010 140367148568768 ddpquery_utils.py:44] Conditional rounding set to True (beta = 0.606531)
I0317 16:09:09.157962 140367148568768 keras_utils.py:365] Adding default num_examples metric to model
I0317 16:09:09.158077 140367148568768 keras_utils.py:368] Adding default num_batches metric to model
I0317 16:09:10.512192 140367148568768 keras_utils.py:365] Adding default num_examples metric to model
I0317 16:09:10.512315 140367148568768 keras_utils.py:368] Adding default num_batches metric to model
I0317 16:09:12.278361 140367148568768 keras_utils.py:365] Adding default num_examples metric to model
I0317 16:09:12.278485 140367148568768 keras_utils.py:368] Adding default num_batches metric to model
I0317 16:09:14.021716 140367148568768 keras_utils.py:365] Adding default num_examples metric to model
I0317 16:09:14.021839 140367148568768 keras_utils.py:368] Adding default num_batches metric to model
I0317 16:09:14.286071 140367148568768 keras_utils.py:365] Adding default num_examples metric to model
I0317 16:09:14.286196 140367148568768 keras_utils.py:368] Adding default num_batches metric to model
Traceback (most recent call last):
  File "/h/ady/.cache/bazel/_bazel_ady/39df1af3e8de7748262d01b9bcee607d/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 290, in <module>
    app.run(main)
  File "/h/ady/.conda/envs/tff/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/h/ady/.conda/envs/tff/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/h/ady/.cache/bazel/_bazel_ady/39df1af3e8de7748262d01b9bcee607d/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/distributed_dp/fl_run.py", line 272, in main
    program_state_manager, metrics_managers = training_utils.create_managers(
  File "/h/ady/.cache/bazel/_bazel_ady/39df1af3e8de7748262d01b9bcee607d/execroot/org_federated_research/bazel-out/k8-opt/bin/distributed_dp/fl_run.runfiles/org_federated_research/utils/training_utils.py", line 65, in create_managers
    tensorboard_release_manager = tff.program.TensorBoardReleaseManager(
AttributeError: module 'tensorflow_federated.python.program' has no attribute 'TensorBoardReleaseManager'
(tff) ady@vws35:~/code/federated/distributed_dp$ 
kenziyuliu commented 2 years ago

Hi @adam-dziedzic,

Thanks a lot for your interest! I just tried locally using the repo I cloned when writing https://github.com/google-research/federated/issues/57#issuecomment-1062468566 and it worked, but when I try re-cloning the repo it produced the error you shared.

It seems like this commit (https://github.com/google-research/federated/commit/f80ba310e6dbb7bd5ea524330c660595e36363b9) introduced the symbol change; for workaround, maybe try using an earlier commit (e.g. 92e87a3) or just change the symbol manually to TensorboardReleaseManager (lower case b)?

This looks like a TFF change so I'll let others comment on whether a different TFF version should be used.

zcharles8 commented 2 years ago

@adam-dziedzic @kenziyuliu This is coming up due to https://github.com/tensorflow/federated/commit/7fa77074dd0f799f0a2fdcea07b131041283b612. I believe that TFF is moving to a more regular release cadence, instead of relying on a nightly package, so the code in federated_research has been changed to match that internally but obviously this does not work without a new TFF release.

The error above should be fixed simply by changing to TensorboardReleaseManager manually. We are currently evaluating how to make sure that this repo is useful to external users even in the face of the TFF changes, so feedback is obviously welcome.