google-parfait / federated-compute

Libraries for executing federated programs and computations.
Apache License 2.0
67 stars 18 forks source link

Demo test failed due to `OSError: read-only filesystem` #7

Closed yuangu002 closed 1 year ago

yuangu002 commented 1 year ago

I followed the GETTING_STARTED guide and got a successful build.

But the demo test (bazelisk //fcp/demo:federated_program_test --config=clang) failed locally. The test log is pasted as following:

exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //fcp/demo:federated_program_test
-----------------------------------------------------------------------------
2023-04-07 17:14:13.652917: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-04-07 17:14:13.690128: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-04-07 17:14:13.690429: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-07 17:14:14.381273: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:tensorflow:From /usr/local/google/home/ryangu/venv/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:576: calling function (from tensorflow.python.eager.polymorphic_function.polymorphic_function) with experimental_relax_shapes is deprecated and will be removed in a future version.
Instructions for updating:
experimental_relax_shapes is deprecated, use reduce_retracing instead
WARNING:tensorflow:From /usr/local/google/home/ryangu/venv/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:576: calling function (from tensorflow.python.eager.polymorphic_function.polymorphic_function) with experimental_relax_shapes is deprecated and will be removed in a future version.
Instructions for updating:
experimental_relax_shapes is deprecated, use reduce_retracing instead
Traceback (most recent call last):
  File "/usr/local/google/home/ryangu/.cache/bazel/_bazel_ryangu/6e3f6b9903df237029824c625696458e/sandbox/linux-sandbox/12836/execroot/com_google_fcp/bazel-out/k8-opt/bin/fcp/demo/federated_program_test.runfiles/com_google_fcp/fcp/demo/federated_program_test.py", line 24, in <module>
    import tensorflow_federated as tff
  File "/usr/local/google/home/ryangu/venv/lib/python3.10/site-packages/tensorflow_federated/__init__.py", line 78, in <module>
    backends.native.set_sync_local_cpp_execution_context()
  File "/usr/local/google/home/ryangu/venv/lib/python3.10/site-packages/tensorflow_federated/python/core/backends/native/execution_contexts.py", line 164, in set_sync_local_cpp_execution_context
    context = create_sync_local_cpp_execution_context(
  File "/usr/local/google/home/ryangu/venv/lib/python3.10/site-packages/tensorflow_federated/python/core/backends/native/execution_contexts.py", line 148, in create_sync_local_cpp_execution_context
    factory = executor_factory.local_cpp_executor_factory(
  File "/usr/local/google/home/ryangu/venv/lib/python3.10/site-packages/tensorflow_federated/python/core/impl/executor_stacks/executor_factory.py", line 103, in local_cpp_executor_factory
    _decompress_file(compressed_path, binary_path)
  File "/usr/local/google/home/ryangu/venv/lib/python3.10/site-packages/tensorflow_federated/python/core/impl/executor_stacks/executor_factory.py", line 55, in _decompress_file
    with open(output_path, 'wb') as binary_file:
OSError: [Errno 30] Read-only file system: '/usr/local/google/home/ryangu/venv/lib/python3.10/site-packages/tensorflow_federated/python/core/impl/executor_stacks/../../../../data/worker_binary'

I only used sudo on two apt install commands, as instructed: sudo apt install -y git gcc python3 python3-dev python3-venv and sudo apt install -y clang lld libc++-dev libc++abi-dev

timonvo commented 1 year ago

It's unclear to me why you would have permission issue, since the path being written to seems to be /usr/local/google/home/ryangu/venv/lib/python3.10/site-packages/tensorflow_federated/data, which seems to be in your home directory and so presumably should be writable to you.

Could you try and inspect the permissions on each of the directories in that path, to see if somehow the permissions are misconfigured on any of them, preventing you from writing to them? E.g. you could use namei -l /usr/local/google/home/ryangu/venv/lib/python3.10/site-packages/tensorflow_federated/data/ to print out the permissions for each directory in that path.

yuangu002 commented 1 year ago

Thanks @timonvo, here is the output:

ryangu@ryangu:~$ namei -l /usr/local/google/home/ryangu/Desktop/venv/lib/python3.10/site-packages/tensorflow_federated/data/
f: /usr/local/google/home/ryangu/Desktop/venv/lib/python3.10/site-packages/tensorflow_federated/data/
drwxr-xr-x root   root         /
drwxr-xr-x root   root         usr
drwxr-xr-x root   root         local
drwxrwxrwt root   root         google
drwxr-xr-x root   root         home
drwx------ ryangu primarygroup ryangu
drwxr-xr-x ryangu primarygroup Desktop
drwxr-x--- ryangu primarygroup venv
drwxr-x--- ryangu primarygroup lib
drwxr-x--- ryangu primarygroup python3.10
drwxr-x--- ryangu primarygroup site-packages
drwxr-x--- ryangu primarygroup tensorflow_federated
drwxr-x--- ryangu primarygroup data

I think I see what the problem is, so my role (ryangu) only has the group permission instead of owner permission. I am not sure why my cloudtop doesn't have root access though. Happy to discuss with you internally.

yuangu002 commented 1 year ago

I changed the project root directory from ~/Desktop to ~/ and the error just went away. The error means you have no write access to the directory, but I am not sure exactly what's the root cause of my case.