ewrfcas / MVSFormer

Codes of MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth (TMLR2023)
Apache License 2.0
175 stars 10 forks source link

blendmvs high_res #13

Open ZWEQHLWY opened 1 year ago

ZWEQHLWY commented 1 year ago

你好,我想请问下blendmvs high_res数据集链接好像无法下载,有没有其他方法下载

ewrfcas commented 1 year ago

之前有同学也反应这个问题,似乎多次尝试才终于能够下载...具体情况我们也不太清楚

ZWEQHLWY commented 1 year ago

感谢你的回应。 当我在训练MVSFormer(Twins-based)的时候能够正常进入loss迭代界面,但是在训练MVSFormer(frozen DINO-based)的时候,出现了报错: File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。 我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1,CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs(为了解决cannot import name ‘container_abcs’ from torch._six’的报错)。

ewrfcas commented 1 year ago

是否开启了DDP,如果开启试着关掉看看?这个情况根据我的经验是出现了loss为None或者跳过了训练某些步骤造成的。

ZDDWLIG commented 1 year ago

请问是否有blendedmvs的评估程序

laomeng0703 commented 1 year ago

感谢你的回应。 当我在训练MVSFormer(Twins-based)的时候能够正常进入loss迭代界面,但是在训练MVSFormer(frozen DINO-based)的时候,出现了报错: File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。 我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1,CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs(为了解决cannot import name ‘container_abcs’ from torch._six’的报错)。

你可以试着把nccl后端改为gloo,或者如果你的电脑只有一个显卡的话就关掉DDP。我的电脑是多卡,然后用的gloo后端,虽然最后会报错raise EOFError,但是不影响模型生成。

kk6398 commented 1 year ago

感谢你的回应。 当我在训练MVSFormer(Twins-based)的时候能够正常进入loss迭代界面,但是在训练MVSFormer(frozen DINO-based)的时候,出现了报错: File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。 我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1,CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs(为了解决cannot import name ‘container_abcs’ from torch._six’的报错)。

您好,我配置的环境与你的一样,并且在3090*4上训练,但在训练MVSFormer(Twins-based)时,遇到warning:/data/hkk/anaconda3/envs/mvs/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 然后在正常迭代1200/6775时,程序会突然报错中断,显示: 1688091392288_副本

kk6398 commented 1 year ago

感谢你的回应。 当我在训练MVSFormer(Twins-based)的时候能够正常进入loss迭代界面,但是在训练MVSFormer(frozen DINO-based)的时候,出现了报错: File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。 我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1,CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs(为了解决cannot import name ‘container_abcs’ from torch._six’的报错)。

你可以试着把nccl后端改为gloo,或者如果你的电脑只有一个显卡的话就关掉DDP。我的电脑是多卡,然后用的gloo后端,虽然最后会报错raise EOFError,但是不影响模型生成。

您好,我在3090*4上训练,开启了DDP。我按照您的方法,将train.py中第30行的nccl后端更改为gloo,还是报相同的错误。需要怎么改进呢?

laomeng0703 commented 1 year ago

感谢你的回应。 当我在训练MVSFormer(Twins-based)的时候能够正常进入loss迭代界面,但是在训练MVSFormer(frozen DINO-based)的时候,出现了报错: File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。 我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1,CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs(为了解决cannot import name ‘container_abcs’ from torch._six’的报错)。

你可以试着把nccl后端改为gloo,或者如果你的电脑只有一个显卡的话就关掉DDP。我的电脑是多卡,然后用的gloo后端,虽然最后会报错raise EOFError,但是不影响模型生成。

您好,我在3090*4上训练,开启了DDP。我按照您的方法,将train.py中第30行的nccl后端更改为gloo,还是报相同的错误。需要怎么改进呢?

看你的报错,好像是dataloader读取dtu数据集的时候出问题了,估计是你的数据集没完整下载,还有你的pytorch1.1有点老了 实在不行的话建议换个环境试试,我用的是python3.8, pytorch1.11+cu113, numpy1.23.1

kk6398 commented 1 year ago

感谢你的回应。 当我在训练MVSFormer(Twins-based)的时候能够正常进入loss迭代界面,但是在训练MVSFormer(frozen DINO-based)的时候,出现了报错: File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。 我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1,CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs(为了解决cannot import name ‘container_abcs’ from torch._six’的报错)。

你可以试着把nccl后端改为gloo,或者如果你的电脑只有一个显卡的话就关掉DDP。我的电脑是多卡,然后用的gloo后端,虽然最后会报错raise EOFError,但是不影响模型生成。

您好,我在3090*4上训练,开启了DDP。我按照您的方法,将train.py中第30行的nccl后端更改为gloo,还是报相同的错误。需要怎么改进呢?

看你的报错,好像是dataloader读取dtu数据集的时候出问题了,估计是你的数据集没完整下载,还有你的pytorch1.1有点老了 实在不行的话建议换个环境试试,我用的是python3.8, pytorch1.11+cu113, numpy1.23.1

您好,我配置的环境是python3.7+pytorch1.9.0+cu111+numpy1.20.1。有关数据集的问题已经解决了(解压时出现了3张错误的图像),但是接下来在训练MVSFormer (Twins-based)时遇到了一些warning: a978b2c34d3c4b5683b0ef0910bfa18 请问您知道这是什么原因吗?会影响训练结果吗? 感谢