blendmvs high_res - Githubissues

ewrfcas / MVSFormer

Codes of MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth (TMLR2023)

Apache License 2.0

175 stars 10 forks source link

blendmvs high_res #13

Open ZWEQHLWY opened 1 year ago

ZWEQHLWY commented 1 year ago

你好，我想请问下blendmvs high_res数据集链接好像无法下载，有没有其他方法下载

ewrfcas commented 1 year ago

之前有同学也反应这个问题，似乎多次尝试才终于能够下载...具体情况我们也不太清楚

ZWEQHLWY commented 1 year ago

感谢你的回应。当我在训练MVSFormer（Twins-based）的时候能够正常进入loss迭代界面，但是在训练MVSFormer（frozen DINO-based）的时候，出现了报错： File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1，CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs（为了解决cannot import name ‘container_abcs’ from torch._six’的报错）。

ewrfcas commented 1 year ago

是否开启了DDP，如果开启试着关掉看看？这个情况根据我的经验是出现了loss为None或者跳过了训练某些步骤造成的。

ZDDWLIG commented 1 year ago

请问是否有blendedmvs的评估程序

laomeng0703 commented 1 year ago

感谢你的回应。当我在训练MVSFormer（Twins-based）的时候能够正常进入loss迭代界面，但是在训练MVSFormer（frozen DINO-based）的时候，出现了报错： File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1，CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs（为了解决cannot import name ‘container_abcs’ from torch._six’的报错）。

你可以试着把nccl后端改为gloo，或者如果你的电脑只有一个显卡的话就关掉DDP。我的电脑是多卡，然后用的gloo后端，虽然最后会报错raise EOFError，但是不影响模型生成。

kk6398 commented 1 year ago

感谢你的回应。当我在训练MVSFormer（Twins-based）的时候能够正常进入loss迭代界面，但是在训练MVSFormer（frozen DINO-based）的时候，出现了报错： File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1，CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs（为了解决cannot import name ‘container_abcs’ from torch._six’的报错）。

您好，我配置的环境与你的一样，并且在3090*4上训练，但在训练MVSFormer（Twins-based）时，遇到warning:/data/hkk/anaconda3/envs/mvs/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate 然后在正常迭代1200/6775时，程序会突然报错中断，显示： 1688091392288_副本

kk6398 commented 1 year ago

感谢你的回应。当我在训练MVSFormer（Twins-based）的时候能够正常进入loss迭代界面，但是在训练MVSFormer（frozen DINO-based）的时候，出现了报错： File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1，CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs（为了解决cannot import name ‘container_abcs’ from torch._six’的报错）。

你可以试着把nccl后端改为gloo，或者如果你的电脑只有一个显卡的话就关掉DDP。我的电脑是多卡，然后用的gloo后端，虽然最后会报错raise EOFError，但是不影响模型生成。

您好，我在3090*4上训练，开启了DDP。我按照您的方法，将train.py中第30行的nccl后端更改为gloo，还是报相同的错误。需要怎么改进呢？

laomeng0703 commented 1 year ago

感谢你的回应。当我在训练MVSFormer（Twins-based）的时候能够正常进入loss迭代界面，但是在训练MVSFormer（frozen DINO-based）的时候，出现了报错： File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1，CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs（为了解决cannot import name ‘container_abcs’ from torch._six’的报错）。

你可以试着把nccl后端改为gloo，或者如果你的电脑只有一个显卡的话就关掉DDP。我的电脑是多卡，然后用的gloo后端，虽然最后会报错raise EOFError，但是不影响模型生成。

您好，我在3090*4上训练，开启了DDP。我按照您的方法，将train.py中第30行的nccl后端更改为gloo，还是报相同的错误。需要怎么改进呢？

看你的报错，好像是dataloader读取dtu数据集的时候出问题了，估计是你的数据集没完整下载，还有你的pytorch1.1有点老了实在不行的话建议换个环境试试，我用的是python3.8, pytorch1.11+cu113, numpy1.23.1

kk6398 commented 1 year ago

感谢你的回应。当我在训练MVSFormer（Twins-based）的时候能够正常进入loss迭代界面，但是在训练MVSFormer（frozen DINO-based）的时候，出现了报错： File "/home/aszitao/anaconda3/envs/mvsformer/lib/python3.7/site-packages/torch/cuda/amp/grad_scaler.py", line 137, in _check_scale_growth_tracker assert self._scale is not None, "Attempted {} but _scale is None. ".format(funcname) + fix AssertionError: Attempted step but _scale is None. This may indicate your script did not use scaler.scale(loss or outputs) earlier in the iteration.请问我应该如何修改。我的环境是python 3.7 pytorch1.9.0+cu111 3090*1。为了训练,我将n_gpu,batchsize修改为1，CUDA_VISIBLE_DEVICES=0,1->0, 并将from torch._six import container_abcs 修改成import collections.abc as container_abcs（为了解决cannot import name ‘container_abcs’ from torch._six’的报错）。

你可以试着把nccl后端改为gloo，或者如果你的电脑只有一个显卡的话就关掉DDP。我的电脑是多卡，然后用的gloo后端，虽然最后会报错raise EOFError，但是不影响模型生成。

您好，我在3090*4上训练，开启了DDP。我按照您的方法，将train.py中第30行的nccl后端更改为gloo，还是报相同的错误。需要怎么改进呢？

看你的报错，好像是dataloader读取dtu数据集的时候出问题了，估计是你的数据集没完整下载，还有你的pytorch1.1有点老了实在不行的话建议换个环境试试，我用的是python3.8, pytorch1.11+cu113, numpy1.23.1

您好，我配置的环境是python3.7+pytorch1.9.0+cu111+numpy1.20.1。有关数据集的问题已经解决了（解压时出现了3张错误的图像），但是接下来在训练MVSFormer (Twins-based)时遇到了一些warning： a978b2c34d3c4b5683b0ef0910bfa18 请问您知道这是什么原因吗？会影响训练结果吗？感谢