Open lcxsnow opened 1 year ago
I installed tensorflow 2.5.0 instead of 2.5.0rc0. There are some error, when I run the "bash train.sh exp_name &> train.txt". What else should the python version and typing-extensions version should be?
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. optax 0.1.4 requires typing-extensions>=3.10.0, but you have typing-extensions 3.7.4.3 which is incompatible. chex 0.1.6 requires typing-extensions>=4.2.0; python_version < "3.11", but you have typing-extensions 3.7.4.3 which is incompatible.
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/xmcgan/main.py", line 71, in
I tried jax with version 0.4.3, 0.3.4 and 0.3.5. None of them work.
ERROR: No matching distribution found for tensorflow==2.5.0rc0
Indeed, there is no version 2.5.0rc0 at https://pypi.org/project/tensorflow/#history.
AttributeError: module 'jax' has no attribute 'api'
Maybe try jaxlib:
woctezuma
Thanks for your reply. I fixed the problem with install the cuda version of jax and install the tensorflow 2.5.0. Need to comfirm "jax.devices()" get the device.
But I got the new error below.
I0214 00:19:29.416610 140332804085568 main.py:52] JAX host: 0 / 1 I0214 00:19:29.416661 140332804085568 main.py:53] JAX devices: [StreamExecutorGpuDevice(id=0, process_index=0, slice_index=0), StreamExecutorGpuDevice(id=1, process_index=0, slice_index=0)] I0214 00:19:29.416717 140332804085568 local.py:45] Setting task status: host_id: 0, host_count: 1 I0214 00:19:29.416844 140332804085568 local.py:50] Created artifact workdir of type ArtifactType.DIRECTORY and value data/exp/exp_name. 2023-02-14 00:19:29.882610: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:417] Loaded runtime CuDNN library: 8.2.1 but source was compiled with: 8.6.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2023-02-14 00:19:29.883696: E external/org_tensorflow/tensorflow/compiler/xla/status_macros.cc:57] INTERNAL: RET_CHECK failure (external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:626) dnn != nullptr Begin stack trace
Here is my nvidia-smi display: NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4
and nvcc -V: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Thu_Jun_11_22:26:38_PDT_2020 Cuda compilation tools, release 11.0, V11.0.194 Build cuda_11.0_bu.TC445_37.28540450_0
Loaded runtime CuDNN library: 8.2.1 but source was compiled with: 8.6.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
It seems to be a version mismatch.
Loaded runtime CuDNN library: 8.2.1 but source was compiled with: 8.6.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
It seems to be a version mismatch.
I upgrade cuda to 11.4 and cudnn to 8.6 and it fixed. But there is a new error coming up.
/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/flax/optim/base.py:49: DeprecationWarning: Use optax
instead of flax.optim
. Refer to the update guide https://flax.readthedocs.io/en/latest/howtos/optax_update_guide.html for detailed instructions.
warnings.warn(
/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/flax/optim/base.py:49: DeprecationWarning: Use optax
instead of flax.optim
. Refer to the update guide https://flax.readthedocs.io/en/latest/howtos/optax_update_guide.html for detailed instructions.
warnings.warn(
I0214 10:54:33.493715 139689696290624 utils.py:31] Checkpoint.restore_or_initialize() ...
I0214 10:54:33.493807 139689696290624 checkpoint.py:301] No checkpoint specified. Restore the latest checkpoint.
I0214 10:54:33.493842 139689696290624 utils.py:31] MultihostCheckpoint.get_latest_checkpoint_to_restore_from() ...
I0214 10:54:33.494217 139689696290624 checkpoint.py:430] Checked checkpoint base_directories: ['data/exp/checkpoints-0'] - common_numbers=set() - exclusive_numbers=set()
I0214 10:54:33.494263 139689696290624 utils.py:41] MultihostCheckpoint.get_latest_checkpoint_to_restore_from() finished after 0.00s.
I0214 10:54:33.494293 139689696290624 checkpoint.py:304] Checkpoint None does not exist.
I0214 10:54:33.494324 139689696290624 utils.py:31] Checkpoint.save() ...
E0214 10:54:33.497232 139689696290624 utils.py:38] Checkpoint.save() FAILED after 0.00s with TypeError.
Traceback (most recent call last):
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/clu/internal/utils.py", line 33, in log_activity
yield
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/clu/internal/utils.py", line 51, in decorator
return wrapped(args, kwargs)
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/clu/checkpoint.py", line 265, in save
f.write(flax.serialization.to_bytes(state))
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/flax/serialization.py", line 383, in to_bytes
return msgpack_serialize(state_dict, in_place=True)
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/flax/serialization.py", line 334, in msgpack_serialize
return msgpack.packb(pytree, default=_msgpack_ext_pack, strict_types=True)
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/msgpack/init.py", line 35, in packb
return Packer(kwargs).pack(o)
File "msgpack/_packer.pyx", line 292, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 298, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 295, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
[Previous line repeated 1 more time]
File "msgpack/_packer.pyx", line 289, in msgpack._cmsgpack.Packer._pack
TypeError: can not serialize 'Array' object
E0214 10:54:33.497600 139689696290624 utils.py:38] Checkpoint.restore_or_initialize() FAILED after 0.00s with TypeError.
Traceback (most recent call last):
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/clu/internal/utils.py", line 33, in log_activity
yield
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/clu/internal/utils.py", line 51, in decorator
return wrapped(args, kwargs)
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/clu/checkpoint.py", line 305, in restore_or_initialize
self.save(state)
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/clu/internal/utils.py", line 51, in decorator
return wrapped(args, kwargs)
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/clu/checkpoint.py", line 265, in save
f.write(flax.serialization.to_bytes(state))
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/flax/serialization.py", line 383, in to_bytes
return msgpack_serialize(state_dict, in_place=True)
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/flax/serialization.py", line 334, in msgpack_serialize
return msgpack.packb(pytree, default=_msgpack_ext_pack, strict_types=True)
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/venv/lib/python3.8/site-packages/msgpack/init.py", line 35, in packb
return Packer(kwargs).pack(o)
File "msgpack/_packer.pyx", line 292, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 298, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 295, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
[Previous line repeated 1 more time]
File "msgpack/_packer.pyx", line 289, in msgpack._cmsgpack.Packer._pack
TypeError: can not serialize 'Array' object
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/anaconda3/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/m11113013/ProjectCode/practice model/xmcgan_image_generation-main/xmcgan/main.py", line 71, in
I tried to update flax from 0.5.1 to 0.6.1, but it appeared the below message: AttributeError: module 'flax' has no attribute 'optim'
the way to fix this is to downgrade flax to 0.5.1.
This make me confused.
When I installed the tensorflow==2.5.0rc0, it appeared the below message:
ERROR: Could not find a version that satisfies the requirement tensorflow==2.5.0rc0 (from versions: 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0rc0, 2.6.0rc1, 2.6.0rc2, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 2.11.0rc2, 2.11.0) ERROR: No matching distribution found for tensorflow==2.5.0rc0
How can I do?