Open saltedfisssh opened 10 months ago
你好 可以提供你的机器配置和gcc版本吗?
机器配置: 4 × A100-SXM4-40GB, Driver Version: 530.30.02, CUDA Driver Version: 12.1 gcc: 9.5.0 torch: 1.8.1+cu111
网络构建完成, 可以运行到开始训练部分, 然后就程序就退出了:
epochs: 0%| | 0/60 [00:00<?, ?it/s]{'NAME': 'filter_truncated', 'AREA_RATIO_THRESH': None, 'AREA_2D_RATIO_THRESH': None, 'GT_TRUNCATED_THRESH': 0.98}
filter truncated ratio: null 3d boxes [[ 2.99 -3.87 -0.66499996 4.43 1.84 1.75
-0.2907964 ]] flipped False image idx 890 frame_id 001773
{'NAME': 'filter_truncated', 'AREA_RATIO_THRESH': None, 'AREA_2D_RATIO_THRESH': None, 'GT_TRUNCATED_THRESH': 0.98}
filter truncated ratio: null 3d boxes [[ 2.93 -4.66 -0.73 4.18 1.86 1.48
-1.6307963]] flipped False image idx 1040 frame_id 002080
并且伴随警告:
/home/user/anaconda3/envs/mono3d/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
/home/user/anaconda3/envs/mono3d/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 29 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
这个警告可以监测一下运行时cpu和内存利用率,调小dataloader的num_works。或者尝试用
export PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning'
忽略
/home/user/anaconda3/envs/mono3d/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call of
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call oflr_scheduler.step()
beforeoptimizer.step()
. "
这个警告不影响我们这边的设备正常训练
另外请问你在安装spconv-1.2.1的时候没有遇到问题吗?之前我们也尝试了几次在A100裸机上配环境,一直没有成功;像这样能够正常跑起来的情况倒是第一次见。 你把程序退出时候的详细log发我下,再给我个邮箱,我把我用的dockerfile发你
已发邮件, 谢谢
请问这个问题该如何解决?
May I ask how to solve this problem?
或者您可以提供docker吗?
Or can you provide a Docker?