google-research / jax3d

Apache License 2.0
729 stars 94 forks source link

How to train? #195

Open AdvancedHe opened 8 months ago

AdvancedHe commented 8 months ago

Hello, when I run stage1.py, it appears: [GpuDevice(id=0, process_index=0), GpuDevice(id=1, process_index=0), GpuDevice(id=2, process_index=0), GpuDevice(id=3, process_index=0), GpuDevice(id=4, process_index=0), GpuDevice(id=5, process_index=0), GpuDevice(id=6, process_index=0), GpuDevice(id=7, process_index=0)] 2023-12-26 10:10:17.684365: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice. 2023-12-26 10:10:17.684429: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories: 2023-12-26 10:10:17.684452: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] ./cuda_sdk_lib 2023-12-26 10:10:17.684468: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] /usr/local/cuda-11.1 2023-12-26 10:10:17.684483: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] . 2023-12-26 10:10:17.684501: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work. 2023-12-26 10:10:17.704204: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory 2023-12-26 10:10:17.704247: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:56] Couldn't invoke ptxas --version 2023-12-26 10:10:17.704902: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory 2023-12-26 10:10:17.704980: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:472] ptxas returned an error during compilation of ptx to sass: 'Internal: Failed to launch ptxas' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Aborted (core dumped)

How to solve this problem?

AdvancedHe commented 8 months ago

image