ShanghaiTech-IMPACT / TeethDreamer

[MICCAI 2024] TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs
MIT License
18 stars 2 forks source link

训练时遇到的错误 #4

Open DHHWILL opened 2 weeks ago

DHHWILL commented 2 weeks ago

您好,根据您给的操作步骤执行,使用样例的数据时遇到了以下错误 [W C:\cb\pytorch_1000000000000\work\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors],然而我的设备是单卡,同时我并没有用过并行训练方法。是否是strategy = 'ddp_find_unused_parameters_false'的问题?但是我没有很好的解决方法 请问需要如何解决呢?期待您的回复,谢谢!

Xcf-xcf commented 1 week ago

请问您能提供更详细一点的信息吗?比如命令行命令以及配置文件等

DHHWILL commented 1 week ago

对不起,是我提供的信息不全面,我用了/example/目录下的图片做测试,通过python TeethDreamer.py -b configs/TeethDreamer.yaml 等正常生成了多视角图片1832_lower_cond_000_000_000_000.png,但是在第六步步使用Neus方法时python run.py --img E:\mycode\TeethDreamer-main\output\1832_lower_cond_000_000_000_000.png --cpu 4 --dir E:\mycode\TeethDreamer-main\output\out --normal --rembg,出现了ZeroDivisionError: division by zero的除零错误,我又返回使用了第五步生成了0.jpg图片,也仍然是这个错误。配置文件和模型文件等均没有修改

DHHWILL commented 1 week ago

File "E:\TeethDreamer-main\instant-nsr-pl\systems\neus.py", line 129, in training_step train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))
ZeroDivisionError: division by zero

Xcf-xcf commented 1 week ago

请问您是使用第五步手动将16张的新视角图片前景给抠出来了吗?如果所有前景都是完整的,按道理应该不会出现这种问题。这一问题一般是由于图片都是背景或者对应相机视角不匹配导致采样点均为背景所产生的。

DHHWILL commented 1 week ago

是的,我使用第五步将新视角的前景扣出生成了一张新的图片,但是不管是 0 命令行为 python TeethDreamer.py -b configs/TeethDreamer.yaml --gpus 0 --test ckpt/TeethDreamer.ckpt --output E:\TeethDreamer-main\output data.params.test_dir=E:\TeethDreamer-main\output\segment 生成多视角图片后手动抠图生成0.png python seg_foreground.py --img E:\TeethDreamer-main\output/oral_lower_cond_000_000_000_000.png --seg E:\TeethDreamer-main\output\seg/0.png 241005141108

后 python run.py --img E:\TeethDreamer-main\output\seg/0.png --cpu 4 --dir E:\TeethDreamer-main\output\reconstruction --normal --rembg 出现 File "E:\TeethDreamer-main\instant-nsr-pl\systems\neus.py", line 130, in training_step train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item())) ZeroDivisionError: division by zero

Xcf-xcf commented 1 week ago

如果您手动抠出前景了,在命令行中就不需要加--rembg选项了。

DHHWILL commented 1 week ago

我尝试删除了--rembg,仍然出现了相同的错误,然后我将第四步生成的oral_lower_cond_000_000_000_000.png输入到命令行中并且加上了--rembg选项 python run.py --img E:\TeethDreamer-main\output\seg/oral_lower_cond_000_000_000_000.png --cpu 4 --dir E:\TeethDreamer-main\output\reconstruction --normal --rembg 也是相同的错误 File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\overrides\base.py", line 98, in forward output = self._forward_module.training_step(*inputs, *kwargs) File "E:\TeethDreamer-main\instant-nsr-pl\systems\neus.py", line 130, in training_step train_num_rays = int(self.train_num_rays (self.train_num_samples / out['num_samples_full'].sum().item())) ZeroDivisionError: division by zero 我测试了upper和lower的图片,错误也完全一致

DHHWILL commented 1 week ago

以下是/reconstruction/0/下的文件 train train test test val val

看上去前景已经抠出,代码中 def training_step(self, batch, batch_idx): out = self(batch) loss = 0.

update train_num_rays

    if self.config.model.dynamic_ray_sampling:
        train_num_rays = int(self.train_num_rays * (self.train_num_samples / out['num_samples_full'].sum().item()))        

其中out['num_samples_full']为0,不知道来源于哪里 希望得到您的解答,谢谢!

Xcf-xcf commented 4 days ago

Sorry, I attend MICCAI2024 recently so the response may be late. In your case, the filename of manually segmented images in step 5 must include 'lower' or 'upper' letters which determines their camera poses.

DHHWILL commented 4 days ago

抱歉多次打扰您,我按照您所说更改了图片名称后依然存在相同的问题 python run.py --img E:\TeethDreamer-main\output\seg/lower.png --cpu 4 --dir E\TeethDreamer-main\output\reconstruction --normal
而后我尝试不使用第五步、不修改文件名称从头走一遍流程 python run.py --img E:\TeethDreamer-main\output\1832_lower_cond_000_000_000_000.png --cpu 4 --dir E:\TeethDreamer-main\output\reconstruction --normal --rembg 依然是相同的问题 然后我尝试更改文件后缀如.jpg/.webp等也没有解决这个问题。 以下是完整的日志信息 (TeethDreamer) E:\TeethDreamer-main\instant-nsr-pl>python run.py --img E:\TeethDreamer-main\output\seg/lower.png --cpu 4 --dir E:\TeethDreamer-main\output\reconstruction --normal
Traceback (most recent call last): File "tools.py", line 155, in image_size = prepare_masked_img(args.input, os.path.join(args.output, 'train'), args.rembg, args.normal, args.real) File "tools.py", line 106, in prepare_masked_img data=imread(img_path) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\skimage\io_io.py", line 53, in imread img = call_plugin('imread', fname, plugin=plugin, plugin_args) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\skimage\io\manage_plugins.py", line 205, in call_plugin return func(*args, *kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\skimage\io_plugins\imageio_plugin.py", line 11, in imread out = np.asarray(imageio_imread(args, kwargs)) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\imageio\v3.py", line 53, in imread with imopen(uri, "r", **plugin_kwargs) as img_file: File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\imageio\core\imopen.py", line 113, in imopen request = Request(uri, io_mode, format_hint=format_hint, extension=extension) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\imageio\core\request.py", line 247, in init self._parse_uri(uri) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\imageio\core\request.py", line 407, in _parse_uri raise FileNotFoundError("No such file: '%s'" % fn) FileNotFoundError: No such file: 'E:\TeethDreamer-main\output\seg\lower.png' Global seed set to 42 Using 16bit None Automatic Mixed Precision (AMP) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs Trainer(limit_train_batches=1.0) was configured so 100% of the batches per epoch will be used.. Trainer(limit_val_batches=1) was configured so 1 batch will be used. LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] fatal: not a git repository (or any of the parent directories): .git E:\TeethDreamer-main\instant-nsr-pl\utils\callbacks.py:76: UserWarning: Code snapshot is not saved. Please make sure you have git installed and are in a git repository. rank_zero_warn("Code snapshot is not saved. Please make sure you have git installed and are in a git repository.")

| Name | Type | Params

0 | cos | CosineSimilarity | 0 1 | model | NeuSModel | 14.0 M

14.0 M Trainable params 0 Non-trainable params 14.0 M Total params 27.955 Total estimated model params size (MB) E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\lightning_fabric\loggers\csv_logs.py:183: UserWarning: Experiment logs directory E:\TeethDreamer-main\output\reconstruction\lower\neus\csv_logs exists and is not empty. Previous log files in this directory will be deleted when the new ones are saved! rank_zero_warn( Traceback (most recent call last):
File "launch.py", line 129, in main() File "launch.py", line 118, in main trainer.fit(system, datamodule=dm) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit call._call_and_handle_interrupt( File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1103, in _run results = self._run_stage() File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1182, in _run_stage self._run_train() File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1205, in _run_train self.fit_loop.run() File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run self.advance(*args, *kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 267, in advance self._outputs = self.epoch_loop.run(self._data_fetcher) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run self.advance(args, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\epoch\training_epoch_loop.py", line 213, in advance batch_output = self.batch_loop.run(kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run self.advance(*args, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\batch\training_batch_loop.py", line 88, in advance outputs = self.optimizer_loop.run(optimizers, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run self.advance(*args, *kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 202, in advance result = self._run_optimization(kwargs, self._optimizers[self.optim_progress.optimizer_position]) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 249, in _run_optimization self._optimizer_step(optimizer, opt_idx, kwargs.get("batch_idx", 0), closure) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 370, in _optimizer_step self.trainer._call_lightning_module_hook( File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1347, in _call_lightning_module_hook output = fn(args, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\core\module.py", line 1744, in optimizer_step optimizer.step(closure=optimizer_closure) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\core\optimizer.py", line 169, in step step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\strategies\strategy.py", line 234, in optimizer_step return self.precision_plugin.optimizer_step( File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\plugins\precision\native_amp.py", line 75, in optimizer_step closure_result = closure() File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 149, in call self._result = self.closure(*args, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 135, in closure step_output = self._step_fn() File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\loops\optimization\optimizer_loop.py", line 419, in _training_step training_step_output = self.trainer._call_strategy_hook("training_step", kwargs.values()) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1485, in _call_strategy_hook output = fn(args, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\strategies\dp.py", line 134, in training_step return self.model(*args, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward return self.module(*inputs[0], *kwargs[0]) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\overrides\data_parallel.py", line 77, in forward output = super().forward(*inputs, *kwargs) File "E:\Installed\anaconda3.8\envs\TeethDreamer\lib\site-packages\pytorch_lightning\overrides\base.py", line 98, in forward output = self._forward_module.training_step(inputs, *kwargs) File "E:\TeethDreamer-main\instant-nsr-pl\systems\neus.py", line 129, in training_step train_num_rays = int(self.train_num_rays (self.train_num_samples / out['num_samples_full'].sum().item())) ZeroDivisionError: division by zero Epoch 0: : 0it [00:49, ?it/s] [W C:\cb\pytorch_1000000000000\work\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

希望得到您的帮助,谢谢!