CVMI-Lab / Point-UV-Diffusion

(ICCV2023) This is the official PyTorch implementation of ICCV2023 paper: Texture Generation on 3D Meshes with Point-UV Diffusion
175 stars 10 forks source link

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 36919) of binary #15

Closed DoingNow20 closed 6 months ago

DoingNow20 commented 6 months ago

/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions

warnings.warn( | distributed init (rank 0): env://, gpu 0 [rank: 0] Global seed set to 12345 Using EMA with decay = 0.99950000 Start training for 1000 epochs Traceback (most recent call last): File "train.py", line 77, in main() File "train.py", line 74, in main train(args) File "train.py", line 43, in train trainer.train(modelmodule=modelmodule, datamodule=datamodule, ckpt_path=None) File "/home/wm/AIGC-3D/Point-UV-Diffusion-main/src/trainer/basetrainer.py", line 83, in train self.train_one_epoch(data_loader_train, modelmodule) File "/home/wm/AIGC-3D/Point-UV-Diffusion-main/src/trainer/basetrainer.py", line 37, in train_one_epoch for batch in data_loader: File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 530, in next data = self._next_data() File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1224, in _next_data return self._process_data(data) File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1250, in _process_data data.reraise() File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/_utils.py", line 457, in reraise raise exception FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/wm/AIGC-3D/Point-UV-Diffusion-main/src/dataset_utils/coarse_stage/clip_condition_data.py", line 212, in getitem color, points, normal = get_fps_point_info(file_path) File "/home/wm/AIGC-3D/Point-UV-Diffusion-main/src/dataset_utils/coarse_stage/clip_condition_data.py", line 119, in get_fps_point_info pointcloud_dict = np.load(file_path) File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/numpy/lib/npyio.py", line 405, in load fid = stack.enter_context(open(os_fspath(file), "rb")) FileNotFoundError: [Errno 2] No such file or directory: '/home/wm/AIGC-3D/DATA/coarse_model/03001627/save_4096/99f02614707ce072e8f8c11a24c52ebb.npz'

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 36919) of binary: /home/wm/Anaconda3/envs/point_uv_diff/bin/python Traceback (most recent call last): File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/distributed/launch.py", line 193, in main() File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/distributed/run.py", line 715, in run elastic_launch( File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/wm/Anaconda3/envs/point_uv_diff/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-03-15_17:54:16 host : Ubuntu20.04 rank : 0 (local_rank: 0) exitcode : 1 (pid: 36919) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
XinYu-Andy commented 6 months ago

FileNotFoundError: [Errno 2] No such file or directory: '/home/wm/AIGC-3D/DATA/coarse_model/03001627/save_4096/99f02614707ce072e8f8c11a24c52ebb.npz'

The error is "FileNotFoundError: [Errno 2] No such file or directory: '/home/wm/AIGC-3D/DATA/coarse_model/03001627/save_4096/99f02614707ce072e8f8c11a24c52ebb.npz'"

Please check if you have downloaded the dataset and put it under the correct path.