Closed MarcusnHuang closed 3 months ago
Can you please share the full log from the beginning. Is it possible that you are using pytorch 2 and not pytorch 1.13 as in our dockerfile?
Thanks for your reply. I am using pytorch1.10.1+cuda11.3
''' to run the code. And there is my log.
/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/MinkowskiEngine/__init__.py:36: UserWarning: The environment variable `OMP_NUM_THREADS` not set. MinkowskiEngine will automatically set `OMP_NUM_THREADS=16`. If you want to set `OMP_NUM_THREADS` manually, please export it on the command line before running a python script. e.g. `export OMP_NUM_THREADS=12; python your_program.py`. It is recommended to set it below 24.
warnings.warn(
/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet3d/evaluation/functional/kitti_utils/eval.py:10: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
def get_thresholds(scores: np.ndarray, num_gt, num_sample_pts=41):
08/19 16:42:33 - mmengine - INFO -
------------------------------------------------------------
System environment:
sys.platform: linux
Python: 3.8.17 (default, Jul 5 2023, 21:04:15) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 1050528218
GPU 0,1: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.3, V11.3.109
GCC: gcc (Ubuntu 7.5.0-6ubuntu2) 7.5.0
PyTorch: 1.10.1
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.3
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.2
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.2
OpenCV: 4.8.0
MMEngine: 0.8.4
Runtime environment:
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: 1050528218
Distributed launcher: none
Distributed training: False
GPU number: 1
------------------------------------------------------------
08/19 16:42:33 - mmengine - INFO - Config:
class_names = [
'ceiling',
'floor',
'wall',
'beam',
'column',
'window',
'door',
'table',
'chair',
'sofa',
'bookcase',
'board',
'clutter',
'unlabeled',
]
.
.
.
08/19 16:42:37 - mmengine - INFO - ------------------------------
08/19 16:42:37 - mmengine - INFO - The length of the dataset: 68
08/19 16:42:37 - mmengine - INFO - The number of instances per category in the dataset:
+----------+--------+
| category | number |
+----------+--------+
| ceiling | 154 |
| floor | 258 |
| wall | 11 |
| beam | 217 |
| column | 42 |
| window | 0 |
| door | 0 |
| table | 0 |
| chair | 0 |
| sofa | 0 |
| bookcase | 0 |
| board | 0 |
| clutter | 0 |
+----------+--------+
08/19 16:42:37 - mmengine - WARNING - The prefix is not set in metric class UnifiedSegMetric.
Loads checkpoint by local backend from path: work_dirs/tmp/instance-only-oneformer3d_1xb2_scannet-and-structured3d.pth
The model and loaded state dict do not match exactly
unexpected key in source state_dict: decoder.queries_1dataset.weight, decoder.queries_2dataset.weight, decoder.out_cls_1dataset.0.weight, decoder.out_cls_1dataset.0.bias, decoder.out_cls_1dataset.2.weight, decoder.out_cls_1dataset.2.bias, decoder.out_cls_2dataset.0.weight, decoder.out_cls_2dataset.0.bias, decoder.out_cls_2dataset.2.weight, decoder.out_cls_2dataset.2.bias
missing keys in source state_dict: decoder.query.weight, decoder.out_cls.0.weight, decoder.out_cls.0.bias, decoder.out_cls.2.weight, decoder.out_cls.2.bias
08/19 16:42:38 - mmengine - INFO - Load checkpoint from work_dirs/tmp/instance-only-oneformer3d_1xb2_scannet-and-structured3d.pth
08/19 16:42:38 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
08/19 16:42:38 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
08/19 16:42:38 - mmengine - INFO - Checkpoints will be saved to /home/wyj/518/hzh/oneformer3d/work_dirs/oneformer3d_1xb2_s3dis-area-5.
/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet3d/structures/points/base_points.py:136: UserWarning: point got color value beyond [0, 255]
warnings.warn('point got color value beyond [0, 255]')
/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet3d/structures/points/base_points.py:136: UserWarning: point got color value beyond [0, 255]
warnings.warn('point got color value beyond [0, 255]')
/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet3d/structures/points/base_points.py:136: UserWarning: point got color value beyond [0, 255]
warnings.warn('point got color value beyond [0, 255]')
Traceback (most recent call last):
File "tools/train.py", line 135, in <module>
main()
File "tools/train.py", line 131, in main
runner.train()
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/runner.py", line 1745, in train
model = self.train_loop.run() # type: ignore
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 96, in run
self.run_epoch()
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 112, in run_epoch
self.run_iter(idx, data_batch)
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/runner/loops.py", line 128, in run_iter
outputs = self.runner.model.train_step(
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 114, in train_step
losses = self._run_forward(data, mode='loss') # type: ignore
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 340, in _run_forward
results = self(**data, mode=mode)
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet3d/models/detectors/base.py", line 75, in forward
return self.loss(inputs, data_samples, **kwargs)
File "/home/wyj/518/hzh/oneformer3d/oneformer3d/oneformer3d.py", line 718, in loss
x = self.decoder(x)
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wyj/518/hzh/oneformer3d/oneformer3d/query_decoder.py", line 340, in forward
return self.forward_iter_pred(x, queries)
File "/home/wyj/518/hzh/oneformer3d/oneformer3d/query_decoder.py", line 308, in forward_iter_pred
queries = self.cross_attn_layers[i](inst_feats, queries, attn_mask)
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wyj/518/hzh/oneformer3d/oneformer3d/query_decoder.py", line 52, in forward
output, _ = self.attn(queries[i], k, v, attn_mask=attn_mask)
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 1003, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "/home/wyj/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/functional.py", line 5013, in multi_head_attention_forward
tgt_len, bsz, embed_dim = query.shape
ValueError: not enough values to unpack (expected 3, got 2)
Can you please try with pytorch==1.13?
Okay, I'll keep trying. And I would like to ask why is it necessary to load the pre-trained model before doing s3dis training, if I build my own data in s3dis format can I put it into network training?
We notices that results are better if initializing the backbone with something. On your own data you can start with our scannet checkpoint.
Thank you for your great work on this project. I want to train the network on the s3dis dataset. I downloaded the pre trained model according to the steps, but there was an error when running
python tools/train.py configs/oneformer3d_1xb2_s3dis-area-5.py
@filaPro @highrut @col14m @oneformer3d-contributor