exiawsh / StreamPETR

[ICCV 2023] StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
Other
564 stars 62 forks source link

Error in training #19

Closed SMSajadi99 closed 1 year ago

SMSajadi99 commented 1 year ago

Hello everyone Thank you very much for your attractive project. As you said in the instructions, I went and installed the libraries and made the folders and put the pkl related to the V1.0-miniversion that I asked you about in the previous question in the data/nuscenes folder, but for train I had a problem and it doesn't start working. At first I will send the folder and after that I will send the error. Thank you for your help.

├── ckpts
├── data
│   └── nuscenes
│       ├── maps
│       │   ├── basemap
│       │   ├── expansion
│       │   └── prediction
│       ├── samples
│       │   ├── CAM_BACK
│       │   ├── CAM_BACK_LEFT
│       │   ├── CAM_BACK_RIGHT
│       │   ├── CAM_FRONT
│       │   ├── CAM_FRONT_LEFT
│       │   ├── CAM_FRONT_RIGHT
│       │   ├── LIDAR_TOP
│       │   ├── RADAR_BACK_LEFT
│       │   ├── RADAR_BACK_RIGHT
│       │   ├── RADAR_FRONT
│       │   ├── RADAR_FRONT_LEFT
│       │   └── RADAR_FRONT_RIGHT
│       ├── sweeps
│       │   ├── CAM_BACK
│       │   ├── CAM_BACK_LEFT
│       │   ├── CAM_BACK_RIGHT
│       │   ├── CAM_FRONT
│       │   ├── CAM_FRONT_LEFT
│       │   ├── CAM_FRONT_RIGHT
│       │   ├── LIDAR_TOP
│       │   ├── RADAR_BACK_LEFT
│       │   ├── RADAR_BACK_RIGHT
│       │   ├── RADAR_FRONT
│       │   ├── RADAR_FRONT_LEFT
│       │   └── RADAR_FRONT_RIGHT
│       └── v1.0-mini
├── docs
├── figs
├── mmdetection3d
│   ├── configs
│   │   ├── 3dssd
│   │   ├── _base_
│   │   │   ├── datasets
│   │   │   ├── models
│   │   │   └── schedules
│   │   ├── benchmark
│   │   ├── centerpoint
│   │   ├── dgcnn
│   │   ├── dynamic_voxelization
│   │   ├── fcaf3d
│   │   ├── fcos3d
│   │   ├── free_anchor
│   │   ├── groupfree3d
│   │   ├── h3dnet
│   │   ├── imvotenet
│   │   ├── imvoxelnet
│   │   ├── monoflex
│   │   ├── mvxnet
│   │   ├── nuimages
│   │   ├── paconv
│   │   ├── parta2
│   │   ├── pgd
│   │   ├── pointnet2
│   │   ├── pointpillars
│   │   ├── point_rcnn
│   │   ├── regnet
│   │   ├── sassd
│   │   ├── second
│   │   ├── smoke
│   │   ├── ssn
│   │   └── votenet
│   ├── data
│   │   ├── lyft
│   │   ├── nuscenes
│   │   │   ├── maps
│   │   │   │   ├── basemap
│   │   │   │   ├── expansion
│   │   │   │   └── prediction
│   │   │   ├── samples
│   │   │   │   ├── CAM_BACK
│   │   │   │   ├── CAM_BACK_LEFT
│   │   │   │   ├── CAM_BACK_RIGHT
│   │   │   │   ├── CAM_FRONT
│   │   │   │   ├── CAM_FRONT_LEFT
│   │   │   │   ├── CAM_FRONT_RIGHT
│   │   │   │   ├── LIDAR_TOP
│   │   │   │   ├── RADAR_BACK_LEFT
│   │   │   │   ├── RADAR_BACK_RIGHT
│   │   │   │   ├── RADAR_FRONT
│   │   │   │   ├── RADAR_FRONT_LEFT
│   │   │   │   └── RADAR_FRONT_RIGHT
│   │   │   ├── sweeps
│   │   │   │   ├── CAM_BACK
│   │   │   │   ├── CAM_BACK_LEFT
│   │   │   │   ├── CAM_BACK_RIGHT
│   │   │   │   ├── CAM_FRONT
│   │   │   │   ├── CAM_FRONT_LEFT
│   │   │   │   ├── CAM_FRONT_RIGHT
│   │   │   │   ├── LIDAR_TOP
│   │   │   │   ├── RADAR_BACK_LEFT
│   │   │   │   ├── RADAR_BACK_RIGHT
│   │   │   │   ├── RADAR_FRONT
│   │   │   │   ├── RADAR_FRONT_LEFT
│   │   │   │   └── RADAR_FRONT_RIGHT
│   │   │   └── v1.0-mini
│   │   ├── s3dis
│   │   │   └── meta_data
│   │   ├── scannet
│   │   │   └── meta_data
│   │   └── sunrgbd
│   │       └── matlab
│   ├── demo
│   │   └── data
│   │       ├── kitti
│   │       ├── nuscenes
│   │       ├── scannet
│   │       └── sunrgbd
│   ├── docker
│   │   └── serve
│   ├── docs
│   │   ├── en
│   │   │   ├── datasets
│   │   │   ├── _static
│   │   │   │   ├── css
│   │   │   │   └── image
│   │   │   ├── supported_tasks
│   │   │   └── tutorials
│   │   └── zh_cn
│   │       ├── datasets
│   │       ├── _static
│   │       │   ├── css
│   │       │   └── image
│   │       ├── supported_tasks
│   │       └── tutorials
│   ├── mmdet3d
│   │   ├── apis
│   │   ├── core
│   │   │   ├── anchor
│   │   │   ├── bbox
│   │   │   │   ├── assigners
│   │   │   │   ├── coders
│   │   │   │   ├── iou_calculators
│   │   │   │   ├── samplers
│   │   │   │   └── structures
│   │   │   ├── evaluation
│   │   │   │   ├── kitti_utils
│   │   │   │   ├── scannet_utils
│   │   │   │   └── waymo_utils
│   │   │   ├── points
│   │   │   ├── post_processing
│   │   │   ├── utils
│   │   │   ├── visualizer
│   │   │   └── voxel
│   │   ├── datasets
│   │   │   └── pipelines
│   │   ├── models
│   │   │   ├── backbones
│   │   │   ├── decode_heads
│   │   │   ├── dense_heads
│   │   │   ├── detectors
│   │   │   ├── fusion_layers
│   │   │   ├── losses
│   │   │   ├── middle_encoders
│   │   │   ├── model_utils
│   │   │   ├── necks
│   │   │   ├── roi_heads
│   │   │   │   ├── bbox_heads
│   │   │   │   ├── mask_heads
│   │   │   │   └── roi_extractors
│   │   │   ├── segmentors
│   │   │   ├── utils
│   │   │   └── voxel_encoders
│   │   ├── ops
│   │   │   ├── dgcnn_modules
│   │   │   ├── paconv
│   │   │   ├── pointnet_modules
│   │   │   └── spconv
│   │   │       └── overwrite_spconv
│   │   └── utils
│   ├── mmdet3d.egg-info
│   ├── projects
│   │   └── example_project
│   │       ├── configs
│   │       └── dummy
│   ├── requirements
│   ├── resources
│   ├── tests
│   │   ├── data
│   │   │   ├── kitti
│   │   │   │   ├── kitti_gt_database
│   │   │   │   └── training
│   │   │   │       ├── image_2
│   │   │   │       ├── velodyne
│   │   │   │       └── velodyne_reduced
│   │   │   ├── lyft
│   │   │   │   ├── lidar
│   │   │   │   └── v1.01-train
│   │   │   │       ├── maps
│   │   │   │       └── v1.01-train
│   │   │   ├── nuscenes
│   │   │   │   ├── samples
│   │   │   │   │   ├── CAM_BACK_LEFT
│   │   │   │   │   └── LIDAR_TOP
│   │   │   │   └── sweeps
│   │   │   │       └── LIDAR_TOP
│   │   │   ├── ops
│   │   │   ├── s3dis
│   │   │   │   ├── instance_mask
│   │   │   │   ├── points
│   │   │   │   └── semantic_mask
│   │   │   ├── scannet
│   │   │   │   ├── instance_mask
│   │   │   │   ├── points
│   │   │   │   └── semantic_mask
│   │   │   ├── semantickitti
│   │   │   │   └── sequences
│   │   │   │       └── 00
│   │   │   │           ├── labels
│   │   │   │           └── velodyne
│   │   │   ├── sunrgbd
│   │   │   │   ├── points
│   │   │   │   └── sunrgbd_trainval
│   │   │   │       └── image
│   │   │   └── waymo
│   │   │       ├── kitti_format
│   │   │       │   ├── training
│   │   │       │   │   ├── image_0
│   │   │       │   │   └── velodyne
│   │   │       │   └── waymo_gt_database
│   │   │       └── waymo_format
│   │   │           └── validation
│   │   ├── test_data
│   │   │   ├── test_datasets
│   │   │   └── test_pipelines
│   │   │       ├── test_augmentations
│   │   │       └── test_loadings
│   │   ├── test_metrics
│   │   ├── test_models
│   │   │   ├── test_common_modules
│   │   │   ├── test_fusion
│   │   │   ├── test_heads
│   │   │   ├── test_necks
│   │   │   └── test_voxel_encoder
│   │   ├── test_runtime
│   │   ├── test_samples
│   │   └── test_utils
│   └── tools
│       ├── analysis_tools
│       ├── data_converter
│       ├── deployment
│       ├── misc
│       └── model_converters
├── nusc_tracking
├── projects
│   ├── configs
│   │   ├── PETRv1
│   │   ├── StreamPETR
│   │   └── test_speed
│   └── mmdet3d_plugin
│       ├── core
│       │   ├── apis
│       │   ├── bbox
│       │   │   ├── assigners
│       │   │   ├── coders
│       │   │   └── match_costs
│       │   └── evaluation
│       ├── datasets
│       │   ├── pipelines
│       │   └── samplers
│       └── models
│           ├── backbones
│           │   └── __pycache__
│           ├── dense_heads
│           ├── detectors
│           ├── necks
│           └── utils
└── tools
    └── data_converter
        └── __pycache__

image image image image image image image . . . image image image image

exiawsh commented 1 year ago

Hello everyone Thank you very much for your attractive project. As you said in the instructions, I went and installed the libraries and made the folders and put the pkl related to the V1.0-miniversion that I asked you about in the previous question in the data/nuscenes folder, but for train I had a problem and it doesn't start working. At first I will send the folder and after that I will send the error. Thank you for your help.

├── ckpts
├── data
│   └── nuscenes
│       ├── maps
│       │   ├── basemap
│       │   ├── expansion
│       │   └── prediction
│       ├── samples
│       │   ├── CAM_BACK
│       │   ├── CAM_BACK_LEFT
│       │   ├── CAM_BACK_RIGHT
│       │   ├── CAM_FRONT
│       │   ├── CAM_FRONT_LEFT
│       │   ├── CAM_FRONT_RIGHT
│       │   ├── LIDAR_TOP
│       │   ├── RADAR_BACK_LEFT
│       │   ├── RADAR_BACK_RIGHT
│       │   ├── RADAR_FRONT
│       │   ├── RADAR_FRONT_LEFT
│       │   └── RADAR_FRONT_RIGHT
│       ├── sweeps
│       │   ├── CAM_BACK
│       │   ├── CAM_BACK_LEFT
│       │   ├── CAM_BACK_RIGHT
│       │   ├── CAM_FRONT
│       │   ├── CAM_FRONT_LEFT
│       │   ├── CAM_FRONT_RIGHT
│       │   ├── LIDAR_TOP
│       │   ├── RADAR_BACK_LEFT
│       │   ├── RADAR_BACK_RIGHT
│       │   ├── RADAR_FRONT
│       │   ├── RADAR_FRONT_LEFT
│       │   └── RADAR_FRONT_RIGHT
│       └── v1.0-mini
├── docs
├── figs
├── mmdetection3d
│   ├── configs
│   │   ├── 3dssd
│   │   ├── _base_
│   │   │   ├── datasets
│   │   │   ├── models
│   │   │   └── schedules
│   │   ├── benchmark
│   │   ├── centerpoint
│   │   ├── dgcnn
│   │   ├── dynamic_voxelization
│   │   ├── fcaf3d
│   │   ├── fcos3d
│   │   ├── free_anchor
│   │   ├── groupfree3d
│   │   ├── h3dnet
│   │   ├── imvotenet
│   │   ├── imvoxelnet
│   │   ├── monoflex
│   │   ├── mvxnet
│   │   ├── nuimages
│   │   ├── paconv
│   │   ├── parta2
│   │   ├── pgd
│   │   ├── pointnet2
│   │   ├── pointpillars
│   │   ├── point_rcnn
│   │   ├── regnet
│   │   ├── sassd
│   │   ├── second
│   │   ├── smoke
│   │   ├── ssn
│   │   └── votenet
│   ├── data
│   │   ├── lyft
│   │   ├── nuscenes
│   │   │   ├── maps
│   │   │   │   ├── basemap
│   │   │   │   ├── expansion
│   │   │   │   └── prediction
│   │   │   ├── samples
│   │   │   │   ├── CAM_BACK
│   │   │   │   ├── CAM_BACK_LEFT
│   │   │   │   ├── CAM_BACK_RIGHT
│   │   │   │   ├── CAM_FRONT
│   │   │   │   ├── CAM_FRONT_LEFT
│   │   │   │   ├── CAM_FRONT_RIGHT
│   │   │   │   ├── LIDAR_TOP
│   │   │   │   ├── RADAR_BACK_LEFT
│   │   │   │   ├── RADAR_BACK_RIGHT
│   │   │   │   ├── RADAR_FRONT
│   │   │   │   ├── RADAR_FRONT_LEFT
│   │   │   │   └── RADAR_FRONT_RIGHT
│   │   │   ├── sweeps
│   │   │   │   ├── CAM_BACK
│   │   │   │   ├── CAM_BACK_LEFT
│   │   │   │   ├── CAM_BACK_RIGHT
│   │   │   │   ├── CAM_FRONT
│   │   │   │   ├── CAM_FRONT_LEFT
│   │   │   │   ├── CAM_FRONT_RIGHT
│   │   │   │   ├── LIDAR_TOP
│   │   │   │   ├── RADAR_BACK_LEFT
│   │   │   │   ├── RADAR_BACK_RIGHT
│   │   │   │   ├── RADAR_FRONT
│   │   │   │   ├── RADAR_FRONT_LEFT
│   │   │   │   └── RADAR_FRONT_RIGHT
│   │   │   └── v1.0-mini
│   │   ├── s3dis
│   │   │   └── meta_data
│   │   ├── scannet
│   │   │   └── meta_data
│   │   └── sunrgbd
│   │       └── matlab
│   ├── demo
│   │   └── data
│   │       ├── kitti
│   │       ├── nuscenes
│   │       ├── scannet
│   │       └── sunrgbd
│   ├── docker
│   │   └── serve
│   ├── docs
│   │   ├── en
│   │   │   ├── datasets
│   │   │   ├── _static
│   │   │   │   ├── css
│   │   │   │   └── image
│   │   │   ├── supported_tasks
│   │   │   └── tutorials
│   │   └── zh_cn
│   │       ├── datasets
│   │       ├── _static
│   │       │   ├── css
│   │       │   └── image
│   │       ├── supported_tasks
│   │       └── tutorials
│   ├── mmdet3d
│   │   ├── apis
│   │   ├── core
│   │   │   ├── anchor
│   │   │   ├── bbox
│   │   │   │   ├── assigners
│   │   │   │   ├── coders
│   │   │   │   ├── iou_calculators
│   │   │   │   ├── samplers
│   │   │   │   └── structures
│   │   │   ├── evaluation
│   │   │   │   ├── kitti_utils
│   │   │   │   ├── scannet_utils
│   │   │   │   └── waymo_utils
│   │   │   ├── points
│   │   │   ├── post_processing
│   │   │   ├── utils
│   │   │   ├── visualizer
│   │   │   └── voxel
│   │   ├── datasets
│   │   │   └── pipelines
│   │   ├── models
│   │   │   ├── backbones
│   │   │   ├── decode_heads
│   │   │   ├── dense_heads
│   │   │   ├── detectors
│   │   │   ├── fusion_layers
│   │   │   ├── losses
│   │   │   ├── middle_encoders
│   │   │   ├── model_utils
│   │   │   ├── necks
│   │   │   ├── roi_heads
│   │   │   │   ├── bbox_heads
│   │   │   │   ├── mask_heads
│   │   │   │   └── roi_extractors
│   │   │   ├── segmentors
│   │   │   ├── utils
│   │   │   └── voxel_encoders
│   │   ├── ops
│   │   │   ├── dgcnn_modules
│   │   │   ├── paconv
│   │   │   ├── pointnet_modules
│   │   │   └── spconv
│   │   │       └── overwrite_spconv
│   │   └── utils
│   ├── mmdet3d.egg-info
│   ├── projects
│   │   └── example_project
│   │       ├── configs
│   │       └── dummy
│   ├── requirements
│   ├── resources
│   ├── tests
│   │   ├── data
│   │   │   ├── kitti
│   │   │   │   ├── kitti_gt_database
│   │   │   │   └── training
│   │   │   │       ├── image_2
│   │   │   │       ├── velodyne
│   │   │   │       └── velodyne_reduced
│   │   │   ├── lyft
│   │   │   │   ├── lidar
│   │   │   │   └── v1.01-train
│   │   │   │       ├── maps
│   │   │   │       └── v1.01-train
│   │   │   ├── nuscenes
│   │   │   │   ├── samples
│   │   │   │   │   ├── CAM_BACK_LEFT
│   │   │   │   │   └── LIDAR_TOP
│   │   │   │   └── sweeps
│   │   │   │       └── LIDAR_TOP
│   │   │   ├── ops
│   │   │   ├── s3dis
│   │   │   │   ├── instance_mask
│   │   │   │   ├── points
│   │   │   │   └── semantic_mask
│   │   │   ├── scannet
│   │   │   │   ├── instance_mask
│   │   │   │   ├── points
│   │   │   │   └── semantic_mask
│   │   │   ├── semantickitti
│   │   │   │   └── sequences
│   │   │   │       └── 00
│   │   │   │           ├── labels
│   │   │   │           └── velodyne
│   │   │   ├── sunrgbd
│   │   │   │   ├── points
│   │   │   │   └── sunrgbd_trainval
│   │   │   │       └── image
│   │   │   └── waymo
│   │   │       ├── kitti_format
│   │   │       │   ├── training
│   │   │       │   │   ├── image_0
│   │   │       │   │   └── velodyne
│   │   │       │   └── waymo_gt_database
│   │   │       └── waymo_format
│   │   │           └── validation
│   │   ├── test_data
│   │   │   ├── test_datasets
│   │   │   └── test_pipelines
│   │   │       ├── test_augmentations
│   │   │       └── test_loadings
│   │   ├── test_metrics
│   │   ├── test_models
│   │   │   ├── test_common_modules
│   │   │   ├── test_fusion
│   │   │   ├── test_heads
│   │   │   ├── test_necks
│   │   │   └── test_voxel_encoder
│   │   ├── test_runtime
│   │   ├── test_samples
│   │   └── test_utils
│   └── tools
│       ├── analysis_tools
│       ├── data_converter
│       ├── deployment
│       ├── misc
│       └── model_converters
├── nusc_tracking
├── projects
│   ├── configs
│   │   ├── PETRv1
│   │   ├── StreamPETR
│   │   └── test_speed
│   └── mmdet3d_plugin
│       ├── core
│       │   ├── apis
│       │   ├── bbox
│       │   │   ├── assigners
│       │   │   ├── coders
│       │   │   └── match_costs
│       │   └── evaluation
│       ├── datasets
│       │   ├── pipelines
│       │   └── samplers
│       └── models
│           ├── backbones
│           │   └── __pycache__
│           ├── dense_heads
│           ├── detectors
│           ├── necks
│           └── utils
└── tools
    └── data_converter
        └── __pycache__

‍`***

tools/train.py FAILED

Root Cause:

[0]: time: 2023-05-26_06:51:14 rank: 0 (local_rank: 0) exitcode: 1 (pid: 10264) error_file: <N/A> msg: "Process failed with exitcode 1"

Other Failures: [1]: time: 2023-05-26_06:51:14 rank: 1 (local_rank: 1) exitcode: 1 (pid: 10265) error_file: <N/A> msg: "Process failed with exitcode 1" [2]: time: 2023-05-26_06:51:14 rank: 2 (local_rank: 2) exitcode: 1 (pid: 10266) error_file: <N/A> msg: "Process failed with exitcode 1" [3]: time: 2023-05-26_06:51:14 rank: 3 (local_rank: 3) exitcode: 1 (pid: 10267) error_file: <N/A> msg: "Process failed with exitcode 1" [4]: time: 2023-05-26_06:51:14 rank: 4 (local_rank: 4) exitcode: 1 (pid: 10268) error_file: <N/A> msg: "Process failed with exitcode 1" [5]: time: 2023-05-26_06:51:14 rank: 5 (local_rank: 5) exitcode: 1 (pid: 10269) error_file: <N/A> msg: "Process failed with exitcode 1" [6]: time: 2023-05-26_06:51:14 rank: 6 (local_rank: 6) exitcode: 1 (pid: 10270) error_file: <N/A> msg: "Process failed with exitcode 1" [7]: time: 2023-05-26_06:51:14 rank: 7 (local_rank: 7) exitcode: 1 (pid: 10271) error_file: <N/A> msg: "Process failed with exitcode 1" ***`

Hi, thanks for your interest. I need more details. Would you please provide other errors above the providing error codes? And how many GPU devices did you use?

exiawsh commented 1 year ago

@SMSajadi99 Hi, what's your numba version? My numba version is 0.53.0.

SMSajadi99 commented 1 year ago

(streampetr) sajadi@sajadi:~/anaconda3/envs/streampetr/StreamPETR$ tools/dist_train.sh projects/configs/StreamPETR/stream_petr_r50_flash_704_bs2_seq_24e.py 8 --work-dir work_dirs/stream_petr_r50_flash_704_bs2_seq_24e/ /home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : tools/train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 8 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_ranks=[0, 1, 2, 3, 4, 5, 6, 7] global_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8] global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_0/3/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_0/4/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_0/5/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_0/6/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_0/7/error.json Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11102) of binary: /home/sajadi/anaconda3/envs/streampetr/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=1 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_ranks=[0, 1, 2, 3, 4, 5, 6, 7] global_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8] global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_1/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_1/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_1/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_1/3/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_1/4/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_1/5/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_1/6/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_1/7/error.json Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in Traceback (most recent call last): File "tools/train.py", line 23, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal
from mmdet3d.datasets import build_dataset SystemError File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in : initialization of _internal failed without raising an exception from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): Traceback (most recent call last): File "tools/train.py", line 23, in File "tools/train.py", line 23, in from mmdet3d.datasets import build_datasetfrom mmdet3d.datasets import build_dataset

File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDatasetfrom .custom_3d import Custom3DDataset

File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_typefrom ..core.bbox import get_box_type

File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403from .evaluation import # noqa: F401, F403

File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_stylefrom .kitti_utils import kitti_eval, kitti_eval_coco_style

File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_stylefrom .eval import kitti_eval, kitti_eval_coco_style

File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numbaimport numba

File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorizefrom numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize

File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemErrorfrom numba.np.ufunc import _internal: initialization of _internal failed without raising an exception SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11169) of binary: /home/sajadi/anaconda3/envs/streampetr/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 2/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=2 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_ranks=[0, 1, 2, 3, 4, 5, 6, 7] global_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8] global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_2/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_2/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_2/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_2/3/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_2/4/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_2/5/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_2/6/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_2/7/error.json Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11243) of binary: /home/sajadi/anaconda3/envs/streampetr/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 1/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=3 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_ranks=[0, 1, 2, 3, 4, 5, 6, 7] global_ranks=[0, 1, 2, 3, 4, 5, 6, 7] role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8] global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_3/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_3/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_3/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_3/3/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_3/4/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_3/5/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_3/6/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_s3vw2v1o/none_6_0_ic2t/attempt_3/7/error.json Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception Traceback (most recent call last): File "tools/train.py", line 23, in from mmdet3d.datasets import build_dataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/init.py", line 4, in from .custom_3d import Custom3DDataset File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 10, in from ..core.bbox import get_box_type File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/init.py", line 4, in from .evaluation import # noqa: F401, F403 File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/init.py", line 4, in from .kitti_utils import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/init.py", line 2, in from .eval import kitti_eval, kitti_eval_coco_style File "/home/sajadi/anaconda3/envs/streampetr/mmdetection3d/mmdet3d/core/evaluation/kitti_utils/eval.py", line 5, in import numba File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/init.py", line 43, in from numba.np.ufunc import (vectorize, guvectorize, threading_layer, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/init.py", line 3, in from numba.np.ufunc.decorators import Vectorize, GUVectorize, vectorize, guvectorize File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/numba/np/ufunc/decorators.py", line 3, in from numba.np.ufunc import _internal SystemError: initialization of _internal failed without raising an exception ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11373) of binary: /home/sajadi/anaconda3/envs/streampetr/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (FAILED). Waiting 300 seconds for other agents to finish /home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:70: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0002865791320800781 seconds {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 0, "group_rank": 0, "worker_id": "11373", "role": "default", "hostname": "sajadi", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{\"message\": \"\"}", "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [0], \"role_rank\": [0], \"role_world_size\": [8]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 1, "group_rank": 0, "worker_id": "11374", "role": "default", "hostname": "sajadi", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{\"message\": \"\"}", "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [1], \"role_rank\": [1], \"role_world_size\": [8]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 2, "group_rank": 0, "worker_id": "11375", "role": "default", "hostname": "sajadi", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{\"message\": \"\"}", "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [2], \"role_rank\": [2], \"role_world_size\": [8]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 3, "group_rank": 0, "worker_id": "11376", "role": "default", "hostname": "sajadi", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{\"message\": \"\"}", "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [3], \"role_rank\": [3], \"role_world_size\": [8]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 4, "group_rank": 0, "worker_id": "11377", "role": "default", "hostname": "sajadi", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{\"message\": \"\"}", "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [4], \"role_rank\": [4], \"role_world_size\": [8]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 5, "group_rank": 0, "worker_id": "11378", "role": "default", "hostname": "sajadi", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{\"message\": \"\"}", "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [5], \"role_rank\": [5], \"role_world_size\": [8]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 6, "group_rank": 0, "worker_id": "11379", "role": "default", "hostname": "sajadi", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{\"message\": \"\"}", "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [6], \"role_rank\": [6], \"role_world_size\": [8]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.FAILED", "source": "WORKER", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": 7, "group_rank": 0, "worker_id": "11380", "role": "default", "hostname": "sajadi", "state": "FAILED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": "{\"message\": \"\"}", "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\", \"local_rank\": [7], \"role_rank\": [7], \"role_world_size\": [8]}", "agent_restarts": 3}} {"name": "torchelastic.worker.status.SUCCEEDED", "source": "AGENT", "timestamp": 0, "metadata": {"run_id": "none", "global_rank": null, "group_rank": 0, "worker_id": null, "role": "default", "hostname": "sajadi", "state": "SUCCEEDED", "total_run_time": 20, "rdzv_backend": "static", "raw_error": null, "metadata": "{\"group_world_size\": 1, \"entry_point\": \"python\"}", "agent_restarts": 3}} /home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py:354: UserWarning:


           CHILD PROCESS FAILED WITH NO ERROR_FILE                

CHILD PROCESS FAILED WITH NO ERROR_FILE Child process 11373 (local_rank 0) FAILED (exitcode 1) Error msg: Process failed with exitcode 1 Without writing an error file to <N/A>. While this DOES NOT affect the correctness of your application, no trace information about the error will be available for inspection. Consider decorating your top level entrypoint function with torch.distributed.elastic.multiprocessing.errors.record. Example:

from torch.distributed.elastic.multiprocessing.errors import record

@record def trainer_main(args):

do train


warnings.warn(_no_error_file_warning_msg(rank, failure)) Traceback (most recent call last): File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/launch.py", line 173, in main() File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/launch.py", line 169, in main run(args) File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/run.py", line 621, in run elastic_launch( File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper return f(*args, **kwargs) File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:


     tools/train.py FAILED         

======================================= Root Cause: [0]: time: 2023-05-26_06:59:35 rank: 0 (local_rank: 0) exitcode: 1 (pid: 11373) error_file: <N/A> msg: "Process failed with exitcode 1"

Other Failures: [1]: time: 2023-05-26_06:59:35 rank: 1 (local_rank: 1) exitcode: 1 (pid: 11374) error_file: <N/A> msg: "Process failed with exitcode 1" [2]: time: 2023-05-26_06:59:35 rank: 2 (local_rank: 2) exitcode: 1 (pid: 11375) error_file: <N/A> msg: "Process failed with exitcode 1" [3]: time: 2023-05-26_06:59:35 rank: 3 (local_rank: 3) exitcode: 1 (pid: 11376) error_file: <N/A> msg: "Process failed with exitcode 1" [4]: time: 2023-05-26_06:59:35 rank: 4 (local_rank: 4) exitcode: 1 (pid: 11377) error_file: <N/A> msg: "Process failed with exitcode 1" [5]: time: 2023-05-26_06:59:35 rank: 5 (local_rank: 5) exitcode: 1 (pid: 11378) error_file: <N/A> msg: "Process failed with exitcode 1" [6]: time: 2023-05-26_06:59:35 rank: 6 (local_rank: 6) exitcode: 1 (pid: 11379) error_file: <N/A> msg: "Process failed with exitcode 1" [7]: time: 2023-05-26_06:59:35 rank: 7 (local_rank: 7) exitcode: 1 (pid: 11380) error_file: <N/A> msg: "Process failed with exitcode 1"


(streampetr) sajadi@sajadi:~/anaconda3/envs/streampetr/StreamPETR$

SMSajadi99 commented 1 year ago

image

SMSajadi99 commented 1 year ago

Package Version Editable project location


absl-py 1.4.0 addict 2.4.0 anyio 3.6.2 argon2-cffi 21.3.0 argon2-cffi-bindings 21.2.0 arrow 1.2.3 asttokens 2.2.1 attrs 23.1.0 backcall 0.2.0 beautifulsoup4 4.12.2 black 23.3.0 bleach 6.0.0 cachetools 5.3.0 certifi 2023.5.7 cffi 1.15.1 charset-normalizer 3.1.0 click 8.1.3 comm 0.1.3 contourpy 1.0.7 cycler 0.11.0 debugpy 1.6.7 decorator 5.1.1 defusedxml 0.7.1 descartes 1.1.0 einops 0.6.1 exceptiongroup 1.1.1 executing 1.2.0 fastjsonschema 2.17.1 fire 0.5.0 flake8 6.0.0 flash-attn 0.2.2 fonttools 4.39.4 fqdn 1.5.1 google-auth 2.18.1 google-auth-oauthlib 1.0.0 grpcio 1.54.2 idna 3.4 imageio 2.29.0 importlib-metadata 6.6.0 importlib-resources 5.12.0 iniconfig 2.0.0 ipykernel 6.23.1 ipython 8.12.2 ipython-genutils 0.2.0 ipywidgets 8.0.6 isoduration 20.11.0 jedi 0.18.2 Jinja2 3.1.2 joblib 1.2.0 jsonpointer 2.3 jsonschema 4.17.3 jupyter 1.0.0 jupyter_client 8.2.0 jupyter-console 6.6.3 jupyter_core 5.3.0 jupyter-events 0.6.3 jupyter_server 2.5.0 jupyter_server_terminals 0.4.4 jupyterlab-pygments 0.2.2 jupyterlab-widgets 3.0.7 kiwisolver 1.4.4 llvmlite 0.36.0 lyft-dataset-sdk 0.0.8 Markdown 3.4.3 MarkupSafe 2.1.2 matplotlib 3.5.2 matplotlib-inline 0.1.6 mccabe 0.7.0 mistune 2.0.5 mmcls 0.25.0 mmcv-full 1.6.0 mmdet 2.28.2 mmdet3d 1.0.0rc6 /home/sajadi/anaconda3/envs/streampetr/mmdetection3d mmsegmentation 0.30.0 mypy-extensions 1.0.0 nbclassic 1.0.0 nbclient 0.8.0 nbconvert 7.4.0 nbformat 5.8.0 nest-asyncio 1.5.6 networkx 2.2 notebook 6.5.4 notebook_shim 0.2.3 numba 0.53.0 numpy 1.24.3 nuscenes-devkit 1.1.10 oauthlib 3.2.2 opencv-python 4.7.0.72 packaging 23.1 pandas 2.0.1 pandocfilters 1.5.0 parso 0.8.3 pathspec 0.11.1 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.5.0 pip 23.0.1 pkgutil_resolve_name 1.3.10 platformdirs 3.5.1 plotly 5.14.1 pluggy 1.0.0 plyfile 0.9 prettytable 3.7.0 prometheus-client 0.16.0 prompt-toolkit 3.0.38 protobuf 4.23.1 psutil 5.9.5 ptyprocess 0.7.0 pure-eval 0.2.2 pyasn1 0.5.0 pyasn1-modules 0.3.0 pycocotools 2.0.6 pycodestyle 2.10.0 pycparser 2.21 pyflakes 3.0.1 Pygments 2.15.1 pyparsing 3.0.9 pyquaternion 0.9.9 pyrsistent 0.19.3 pytest 7.3.1 python-dateutil 2.8.2 python-json-logger 2.0.7 pytz 2023.3 PyWavelets 1.4.1 PyYAML 6.0 pyzmq 25.0.2 qtconsole 5.4.3 QtPy 2.3.1 requests 2.31.0 requests-oauthlib 1.3.1 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rsa 4.9 scikit-image 0.19.3 scikit-learn 1.2.2 scipy 1.10.1 Send2Trash 1.8.2 setuptools 66.0.0 Shapely 1.8.5 six 1.16.0 sniffio 1.3.0 soupsieve 2.4.1 stack-data 0.6.2 tenacity 8.2.2 tensorboard 2.13.0 tensorboard-data-server 0.7.0 termcolor 2.3.0 terminado 0.17.1 terminaltables 3.1.10 threadpoolctl 3.1.0 tifffile 2023.4.12 tinycss2 1.2.1 tomli 2.0.1 torch 1.9.0+cu111 torchaudio 0.9.0 torchvision 0.10.0+cu111 tornado 6.3.2 tqdm 4.65.0 traitlets 5.9.0 trimesh 2.35.39 typing_extensions 4.5.0 tzdata 2023.3 uri-template 1.2.0 urllib3 1.26.16 wcwidth 0.2.6 webcolors 1.13 webencodings 0.5.1 websocket-client 1.5.2 Werkzeug 2.3.4 wheel 0.38.4 widgetsnbextension 4.0.7 yapf 0.33.0 zipp 3.15.0

exiawsh commented 1 year ago

@SMSajadi99 Try this command pip install "numpy<1.24.0" My numpy version is 1.23.3

SMSajadi99 commented 1 year ago

Yes, it was done. Of course, I was running when I made the mistake of saying: nuscenes2d_temporal_infos_val.pkl They don't exist, now I changed all the parts that needed this item in the codes to the following text: nuscenes2d_temporal_infos_val_mini.pkl But I ran into the following problem:

SMSajadi99 commented 1 year ago

(streampetr) sajadi@sajadi:~/anaconda3/envs/streampetr/StreamPETR$ tools/dist_train.sh projects/configs/StreamPETR/stream_petr_r50_flash_704_bs2_seq_24e.py 1 --work-dir work_dirs/stream_petr_r50_flash_704_bs2_seq_24e/ /home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead logger.warn( The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : tools/train.py min_nodes : 1 max_nodes : 1 nproc_per_node : 1 run_id : none rdzv_backend : static rdzv_endpoint : 127.0.0.1:29500 rdzv_configs : {'rank': 0, 'timeout': 900} max_restarts : 3 monitor_interval : 5 log_dir : None metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_3ywdtqmw/none_p36ahq8n INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group /home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future. warnings.warn( INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_3ywdtqmw/none_p36ahq8n/attempt_0/0/error.json projects.mmdet3d_plugin 2023-05-26 08:10:47,846 - mmdet - INFO - Environment info:

sys.platform: linux Python: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0] CUDA available: True GPU 0: NVIDIA GeForce GT 1030 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.2, V11.2.67 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.9.0+cu111 PyTorch compiling details: PyTorch built with:

TorchVision: 0.10.0+cu111 OpenCV: 4.7.0 MMCV: 1.6.0 MMCV Compiler: GCC 9.4 MMCV CUDA Compiler: 11.2 MMDetection: 2.28.2 MMSegmentation: 0.30.0 MMDetection3D: 1.0.0rc6+74bf34a spconv2.0: False

2023-05-26 08:10:48,852 - mmdet - INFO - Distributed training: True 2023-05-26 08:10:49,852 - mmdet - INFO - Config: point_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0] class_names = [ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ] dataset_type = 'CustomNuScenesDataset' data_root = './data/nuscenes/' input_modality = dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=True) file_client_args = dict(backend='disk') train_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_bbox=True, with_label=True, with_bbox_depth=True), dict( type='ObjectRangeFilter', point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='ResizeCropFlipRotImage', data_aug_conf=dict( resize_lim=(0.38, 0.55), final_dim=(256, 704), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=True), dict( type='GlobalRotScaleTransImage', rot_range=[-0.3925, 0.3925], translation_std=[0, 0, 0], scale_ratio_range=[0.95, 1.05], reverse_angle=True, training=True), dict( type='NormalizeMultiviewImage', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='PadMultiViewImage', size_divisor=32), dict( type='PETRFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], collect_keys=[ 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv', 'prev_exists' ]), dict( type='Collect3D', keys=[ 'gt_bboxes_3d', 'gt_labels_3d', 'img', 'gt_bboxes', 'gt_labels', 'centers2d', 'depths', 'prev_exists', 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ], meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'box_mode_3d', 'box_type_3d', 'img_norm_cfg', 'scene_token', 'gt_bboxes_3d', 'gt_labels_3d')) ] test_pipeline = [ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipRotImage', data_aug_conf=dict( resize_lim=(0.38, 0.55), final_dim=(256, 704), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='PETRFormatBundle3D', collect_keys=[ 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ], class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict( type='Collect3D', keys=[ 'img', 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ], meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'box_mode_3d', 'box_type_3d', 'img_norm_cfg', 'scene_token')) ]) ] eval_pipeline = [ dict( type='LoadPointsFromFile', coord_type='LIDAR', load_dim=5, use_dim=5, file_client_args=dict(backend='disk')), dict( type='LoadPointsFromMultiSweeps', sweeps_num=10, file_client_args=dict(backend='disk')), dict( type='DefaultFormatBundle3D', class_names=[ 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' ], with_label=False), dict(type='Collect3D', keys=['points']) ] data = dict( samples_per_gpu=2, workers_per_gpu=4, train=dict( type='CustomNuScenesDataset', data_root='./data/nuscenes/', ann_file='./data/nuscenes/nuscenes2d_temporal_infos_train_mini.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_bbox=True, with_label=True, with_bbox_depth=True), dict( type='ObjectRangeFilter', point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]), dict( type='ObjectNameFilter', classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ]), dict( type='ResizeCropFlipRotImage', data_aug_conf=dict( resize_lim=(0.38, 0.55), final_dim=(256, 704), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=True), dict( type='GlobalRotScaleTransImage', rot_range=[-0.3925, 0.3925], translation_std=[0, 0, 0], scale_ratio_range=[0.95, 1.05], reverse_angle=True, training=True), dict( type='NormalizeMultiviewImage', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='PadMultiViewImage', size_divisor=32), dict( type='PETRFormatBundle3D', class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], collect_keys=[ 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv', 'prev_exists' ]), dict( type='Collect3D', keys=[ 'gt_bboxes_3d', 'gt_labels_3d', 'img', 'gt_bboxes', 'gt_labels', 'centers2d', 'depths', 'prev_exists', 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ], meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'box_mode_3d', 'box_type_3d', 'img_norm_cfg', 'scene_token', 'gt_bboxes_3d', 'gt_labels_3d')) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=True), test_mode=False, box_type_3d='LiDAR', num_frame_losses=1, seq_split_num=2, seq_mode=True, collect_keys=[ 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv', 'img', 'prev_exists', 'img_metas' ], queue_length=1, use_valid_flag=True, filter_empty_gt=False), val=dict( type='CustomNuScenesDataset', data_root='data/nuscenes/', ann_file='./data/nuscenes/nuscenes2d_temporal_infos_val_mini.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipRotImage', data_aug_conf=dict( resize_lim=(0.38, 0.55), final_dim=(256, 704), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='PETRFormatBundle3D', collect_keys=[ 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ], class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict( type='Collect3D', keys=[ 'img', 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ], meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'box_mode_3d', 'box_type_3d', 'img_norm_cfg', 'scene_token')) ]) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=True), test_mode=True, box_type_3d='LiDAR', collect_keys=[ 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv', 'img', 'img_metas' ], queue_length=1), test=dict( type='CustomNuScenesDataset', data_root='data/nuscenes/', ann_file='./data/nuscenes/nuscenes2d_temporal_infos_val_mini.pkl', pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipRotImage', data_aug_conf=dict( resize_lim=(0.38, 0.55), final_dim=(256, 704), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='PETRFormatBundle3D', collect_keys=[ 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ], class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict( type='Collect3D', keys=[ 'img', 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ], meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'box_mode_3d', 'box_type_3d', 'img_norm_cfg', 'scene_token')) ]) ], classes=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], modality=dict( use_lidar=False, use_camera=True, use_radar=False, use_map=False, use_external=True), test_mode=True, box_type_3d='LiDAR', collect_keys=[ 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv', 'img', 'img_metas' ], queue_length=1), shuffler_sampler=dict(type='InfiniteGroupEachSampleInBatchSampler'), nonshuffler_sampler=dict(type='DistributedSampler')) evaluation = dict( interval=42192, pipeline=[ dict(type='LoadMultiViewImageFromFiles', to_float32=True), dict( type='ResizeCropFlipRotImage', data_aug_conf=dict( resize_lim=(0.38, 0.55), final_dim=(256, 704), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True), training=False), dict( type='NormalizeMultiviewImage', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='PadMultiViewImage', size_divisor=32), dict( type='MultiScaleFlipAug3D', img_scale=(1333, 800), pts_scale_ratio=1, flip=False, transforms=[ dict( type='PETRFormatBundle3D', collect_keys=[ 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ], class_names=[ 'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier', 'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone' ], with_label=False), dict( type='Collect3D', keys=[ 'img', 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ], meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'box_mode_3d', 'box_type_3d', 'img_norm_cfg', 'scene_token')) ]) ]) checkpoint_config = dict(interval=1758, max_keep_ckpts=3) log_config = dict( interval=50, hooks=[dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = 'work_dirs/stream_petr_r50_flash_704_bs2_seq_24e/' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' backbone_norm_cfg = dict(type='LN', requires_grad=True) plugin = True plugin_dir = 'projects/mmdet3d_plugin/' voxel_size = [0.2, 0.2, 8] img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) num_gpus = 8 batch_size = 2 num_iters_per_epoch = 1758 num_epochs = 24 queue_length = 1 num_frame_losses = 1 collect_keys = [ 'lidar2img', 'intrinsics', 'extrinsics', 'timestamp', 'img_timestamp', 'ego_pose', 'ego_pose_inv' ] model = dict( type='Petr3D', num_frame_head_grads=1, num_frame_backbone_grads=1, num_frame_losses=1, use_grid_mask=True, img_backbone=dict( pretrained='torchvision://resnet50', type='ResNet', depth=50, num_stages=4, out_indices=(2, 3), frozen_stages=-1, norm_cfg=dict(type='BN2d', requires_grad=False), norm_eval=True, with_cp=True, style='pytorch'), img_neck=dict( type='CPFPN', in_channels=[1024, 2048], out_channels=256, num_outs=2), img_roi_head=dict( type='FocalHead', num_classes=10, in_channels=256, loss_cls2d=dict( type='QualityFocalLoss', use_sigmoid=True, beta=2.0, loss_weight=2.0), loss_centerness=dict( type='GaussianFocalLoss', reduction='mean', loss_weight=1.0), loss_bbox2d=dict(type='L1Loss', loss_weight=5.0), loss_iou2d=dict(type='GIoULoss', loss_weight=2.0), loss_centers2d=dict(type='L1Loss', loss_weight=10.0), train_cfg=dict( assigner2d=dict( type='HungarianAssigner2D', cls_cost=dict(type='FocalLossCost', weight=2.0), reg_cost=dict( type='BBoxL1Cost', weight=5.0, box_format='xywh'), iou_cost=dict(type='IoUCost', iou_mode='giou', weight=2.0), centers2d_cost=dict(type='BBox3DL1Cost', weight=10.0)))), pts_bbox_head=dict( type='StreamPETRHead', num_classes=10, in_channels=256, num_query=644, memory_len=1024, topk_proposals=256, num_propagated=256, with_ego_pos=True, match_with_velo=False, scalar=10, noise_scale=1.0, dn_weight=1.0, split=0.75, LID=True, with_position=True, position_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], code_weights=[2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], transformer=dict( type='PETRTemporalTransformer', decoder=dict( type='PETRTransformerDecoder', return_intermediate=True, num_layers=6, transformerlayers=dict( type='PETRTemporalDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=256, num_heads=8, dropout=0.1), dict( type='PETRMultiheadFlashAttention', embed_dims=256, num_heads=8, dropout=0.1) ], feedforward_channels=2048, ffn_dropout=0.1, with_cp=True, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')))), bbox_coder=dict( type='NMSFreeCoder', post_center_range=[-61.2, -61.2, -10.0, 61.2, 61.2, 10.0], pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], max_num=300, voxel_size=[0.2, 0.2, 8], num_classes=10), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0), loss_bbox=dict(type='L1Loss', loss_weight=0.25), loss_iou=dict(type='GIoULoss', loss_weight=0.0)), train_cfg=dict( pts=dict( grid_size=[512, 512, 1], voxel_size=[0.2, 0.2, 8], point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], out_size_factor=4, assigner=dict( type='HungarianAssigner3D', cls_cost=dict(type='FocalLossCost', weight=2.0), reg_cost=dict(type='BBox3DL1Cost', weight=0.25), iou_cost=dict(type='IoUCost', weight=0.0), pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0])))) ida_aug_conf = dict( resize_lim=(0.38, 0.55), final_dim=(256, 704), bot_pct_lim=(0.0, 0.0), rot_lim=(0.0, 0.0), H=900, W=1600, rand_flip=True) optimizer = dict( type='AdamW', lr=0.0004, paramwise_cfg=dict(custom_keys=dict(img_backbone=dict(lr_mult=0.25))), weight_decay=0.01) optimizer_config = dict( type='Fp16OptimizerHook', loss_scale='dynamic', grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='CosineAnnealing', warmup='linear', warmup_iters=500, warmup_ratio=0.3333333333333333, min_lr_ratio=0.001) find_unused_parameters = False runner = dict(type='IterBasedRunner', max_iters=42192) gpu_ids = range(0, 1)

2023-05-26 08:10:49,853 - mmdet - INFO - Set random seed to 0, deterministic: False /home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/mmdet/models/backbones/resnet.py:401: UserWarning: DeprecationWarning: pretrained is deprecated, please use "init_cfg" instead warnings.warn('DeprecationWarning: pretrained is deprecated, ' 2023-05-26 08:10:50,093 - mmdet - INFO - initialize ResNet with init_cfg {'type': 'Pretrained', 'checkpoint': 'torchvision://resnet50'} 2023-05-26 08:10:50,093 - mmcv - INFO - load model from: torchvision://resnet50 2023-05-26 08:10:50,093 - mmcv - INFO - load checkpoint from torchvision path: torchvision://resnet50 2023-05-26 08:10:50,146 - mmcv - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

2023-05-26 08:10:50,160 - mmdet - INFO - initialize CPFPN with init_cfg {'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'} 2023-05-26 08:10:50,171 - mmdet - INFO - Model: Petr3D( (pts_bbox_head): StreamPETRHead( (loss_cls): FocalLoss() (loss_bbox): L1Loss() (cls_branches): ModuleList( (0): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ReLU(inplace=True) (3): Linear(in_features=256, out_features=256, bias=True) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ReLU(inplace=True) (6): Linear(in_features=256, out_features=10, bias=True) ) (1): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ReLU(inplace=True) (3): Linear(in_features=256, out_features=256, bias=True) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ReLU(inplace=True) (6): Linear(in_features=256, out_features=10, bias=True) ) (2): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ReLU(inplace=True) (3): Linear(in_features=256, out_features=256, bias=True) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ReLU(inplace=True) (6): Linear(in_features=256, out_features=10, bias=True) ) (3): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ReLU(inplace=True) (3): Linear(in_features=256, out_features=256, bias=True) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ReLU(inplace=True) (6): Linear(in_features=256, out_features=10, bias=True) ) (4): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ReLU(inplace=True) (3): Linear(in_features=256, out_features=256, bias=True) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ReLU(inplace=True) (6): Linear(in_features=256, out_features=10, bias=True) ) (5): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): ReLU(inplace=True) (3): Linear(in_features=256, out_features=256, bias=True) (4): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (5): ReLU(inplace=True) (6): Linear(in_features=256, out_features=10, bias=True) ) ) (reg_branches): ModuleList( (0): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ReLU() (2): Linear(in_features=256, out_features=256, bias=True) (3): ReLU() (4): Linear(in_features=256, out_features=10, bias=True) ) (1): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ReLU() (2): Linear(in_features=256, out_features=256, bias=True) (3): ReLU() (4): Linear(in_features=256, out_features=10, bias=True) ) (2): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ReLU() (2): Linear(in_features=256, out_features=256, bias=True) (3): ReLU() (4): Linear(in_features=256, out_features=10, bias=True) ) (3): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ReLU() (2): Linear(in_features=256, out_features=256, bias=True) (3): ReLU() (4): Linear(in_features=256, out_features=10, bias=True) ) (4): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ReLU() (2): Linear(in_features=256, out_features=256, bias=True) (3): ReLU() (4): Linear(in_features=256, out_features=10, bias=True) ) (5): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ReLU() (2): Linear(in_features=256, out_features=256, bias=True) (3): ReLU() (4): Linear(in_features=256, out_features=10, bias=True) ) ) (position_encoder): Sequential( (0): Linear(in_features=192, out_features=1024, bias=True) (1): ReLU() (2): Linear(in_features=1024, out_features=256, bias=True) ) (memory_embed): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): ReLU() (2): Linear(in_features=256, out_features=256, bias=True) ) (featurized_pe): SELayer_Linear( (conv_reduce): Linear(in_features=256, out_features=256, bias=True) (act1): ReLU() (conv_expand): Linear(in_features=256, out_features=256, bias=True) (gate): Sigmoid() ) (reference_points): Embedding(644, 3) (pseudo_reference_points): Embedding(256, 3) (query_embedding): Sequential( (0): Linear(in_features=384, out_features=256, bias=True) (1): ReLU() (2): Linear(in_features=256, out_features=256, bias=True) ) (spatial_alignment): MLN( (reduce): Sequential( (0): Linear(in_features=8, out_features=256, bias=True) (1): ReLU() ) (gamma): Linear(in_features=256, out_features=256, bias=True) (beta): Linear(in_features=256, out_features=256, bias=True) (ln): LayerNorm((256,), eps=1e-05, elementwise_affine=False) ) (time_embedding): Sequential( (0): Linear(in_features=256, out_features=256, bias=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) (ego_pose_pe): MLN( (reduce): Sequential( (0): Linear(in_features=180, out_features=256, bias=True) (1): ReLU() ) (gamma): Linear(in_features=256, out_features=256, bias=True) (beta): Linear(in_features=256, out_features=256, bias=True) (ln): LayerNorm((256,), eps=1e-05, elementwise_affine=False) ) (ego_pose_memory): MLN( (reduce): Sequential( (0): Linear(in_features=180, out_features=256, bias=True) (1): ReLU() ) (gamma): Linear(in_features=256, out_features=256, bias=True) (beta): Linear(in_features=256, out_features=256, bias=True) (ln): LayerNorm((256,), eps=1e-05, elementwise_affine=False) ) (loss_iou): GIoULoss() (transformer): PETRTemporalTransformer( (decoder): PETRTransformerDecoder( (layers): ModuleList( (0): PETRTemporalDecoderLayer( (attentions): ModuleList( (0): MultiheadAttention( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) (1): PETRMultiheadFlashAttention( (attn): FlashMHA( (inner_attn): FlashAttention() (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) ) (ffns): ModuleList( (0): FFN( (activate): ReLU(inplace=True) (layers): Sequential( (0): Sequential( (0): Linear(in_features=256, out_features=2048, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) ) (1): Linear(in_features=2048, out_features=256, bias=True) (2): Dropout(p=0.1, inplace=False) ) (dropout_layer): Identity() ) ) (norms): ModuleList( (0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) (1): PETRTemporalDecoderLayer( (attentions): ModuleList( (0): MultiheadAttention( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) (1): PETRMultiheadFlashAttention( (attn): FlashMHA( (inner_attn): FlashAttention() (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) ) (ffns): ModuleList( (0): FFN( (activate): ReLU(inplace=True) (layers): Sequential( (0): Sequential( (0): Linear(in_features=256, out_features=2048, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) ) (1): Linear(in_features=2048, out_features=256, bias=True) (2): Dropout(p=0.1, inplace=False) ) (dropout_layer): Identity() ) ) (norms): ModuleList( (0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) (2): PETRTemporalDecoderLayer( (attentions): ModuleList( (0): MultiheadAttention( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) (1): PETRMultiheadFlashAttention( (attn): FlashMHA( (inner_attn): FlashAttention() (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) ) (ffns): ModuleList( (0): FFN( (activate): ReLU(inplace=True) (layers): Sequential( (0): Sequential( (0): Linear(in_features=256, out_features=2048, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) ) (1): Linear(in_features=2048, out_features=256, bias=True) (2): Dropout(p=0.1, inplace=False) ) (dropout_layer): Identity() ) ) (norms): ModuleList( (0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) (3): PETRTemporalDecoderLayer( (attentions): ModuleList( (0): MultiheadAttention( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) (1): PETRMultiheadFlashAttention( (attn): FlashMHA( (inner_attn): FlashAttention() (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) ) (ffns): ModuleList( (0): FFN( (activate): ReLU(inplace=True) (layers): Sequential( (0): Sequential( (0): Linear(in_features=256, out_features=2048, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) ) (1): Linear(in_features=2048, out_features=256, bias=True) (2): Dropout(p=0.1, inplace=False) ) (dropout_layer): Identity() ) ) (norms): ModuleList( (0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) (4): PETRTemporalDecoderLayer( (attentions): ModuleList( (0): MultiheadAttention( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) (1): PETRMultiheadFlashAttention( (attn): FlashMHA( (inner_attn): FlashAttention() (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) ) (ffns): ModuleList( (0): FFN( (activate): ReLU(inplace=True) (layers): Sequential( (0): Sequential( (0): Linear(in_features=256, out_features=2048, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) ) (1): Linear(in_features=2048, out_features=256, bias=True) (2): Dropout(p=0.1, inplace=False) ) (dropout_layer): Identity() ) ) (norms): ModuleList( (0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) (5): PETRTemporalDecoderLayer( (attentions): ModuleList( (0): MultiheadAttention( (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) (1): PETRMultiheadFlashAttention( (attn): FlashMHA( (inner_attn): FlashAttention() (out_proj): Linear(in_features=256, out_features=256, bias=True) ) (proj_drop): Dropout(p=0.0, inplace=False) (dropout_layer): Dropout(p=0.1, inplace=False) ) ) (ffns): ModuleList( (0): FFN( (activate): ReLU(inplace=True) (layers): Sequential( (0): Sequential( (0): Linear(in_features=256, out_features=2048, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.1, inplace=False) ) (1): Linear(in_features=2048, out_features=256, bias=True) (2): Dropout(p=0.1, inplace=False) ) (dropout_layer): Identity() ) ) (norms): ModuleList( (0): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (1): LayerNorm((256,), eps=1e-05, elementwise_affine=True) (2): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) ) (post_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True) ) ) ) (img_backbone): ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): ResLayer( (0): Bottleneck( (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer2): ResLayer( (0): Bottleneck( (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer3): ResLayer( (0): Bottleneck( (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (3): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (4): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (5): Bottleneck( (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) (layer4): ResLayer( (0): Bottleneck( (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (downsample): Sequential( (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) (2): Bottleneck( (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) ) ) ) init_cfg={'type': 'Pretrained', 'checkpoint': 'torchvision://resnet50'} (img_neck): CPFPN( (lateral_convs): ModuleList( (0): ConvModule( (conv): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1)) ) (1): ConvModule( (conv): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1)) ) ) (fpn_convs): ModuleList( (0): ConvModule( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) ) init_cfg={'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'} (img_roi_head): FocalHead( (loss_cls): FocalLoss() (loss_bbox): IoULoss() (cls): Conv2d(256, 10, kernel_size=(1, 1), stride=(1, 1)) (shared_reg): Sequential( (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) (2): ReLU() ) (shared_cls): Sequential( (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): GroupNorm(32, 256, eps=1e-05, affine=True) (2): ReLU() ) (centerness): Conv2d(256, 1, kernel_size=(1, 1), stride=(1, 1)) (ltrb): Conv2d(256, 4, kernel_size=(1, 1), stride=(1, 1)) (center2d): Conv2d(256, 2, kernel_size=(1, 1), stride=(1, 1)) (loss_cls2d): QualityFocalLoss() (loss_bbox2d): L1Loss() (loss_iou2d): GIoULoss() (loss_centers2d): L1Loss() (loss_centerness): GaussianFocalLoss() ) (grid_mask): GridMask() ) 2023-05-26 08:10:53,022 - mmdet - INFO - Start running, host: sajadi@sajadi, work_dir: /home/sajadi/anaconda3/envs/streampetr/StreamPETR/work_dirs/stream_petr_r50_flash_704_bs2_seq_24e 2023-05-26 08:10:53,022 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
(ABOVE_NORMAL) Fp16OptimizerHook
(NORMAL ) CheckpointHook
(NORMAL ) CustomDistEvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_train_epoch: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
(NORMAL ) CustomDistEvalHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_train_iter: (VERY_HIGH ) CosineAnnealingLrUpdaterHook
(NORMAL ) CustomDistEvalHook
(LOW ) IterTimerHook


after_train_iter: (ABOVE_NORMAL) Fp16OptimizerHook
(NORMAL ) CheckpointHook
(NORMAL ) CustomDistEvalHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


after_train_epoch: (NORMAL ) CheckpointHook
(NORMAL ) CustomDistEvalHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_val_epoch: (LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


before_val_iter: (LOW ) IterTimerHook


after_val_iter: (LOW ) IterTimerHook


after_val_epoch: (VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


after_run: (VERY_LOW ) TextLoggerHook
(VERY_LOW ) TensorboardLoggerHook


2023-05-26 08:10:53,022 - mmdet - INFO - workflow: [('train', 1)], max: 42192 iters 2023-05-26 08:10:53,024 - mmdet - INFO - Checkpoints will be saved to /home/sajadi/anaconda3/envs/streampetr/StreamPETR/work_dirs/stream_petr_r50_flash_704_bs2_seq_24e by HardDiskBackend. Traceback (most recent call last): File "tools/train.py", line 263, in main() File "tools/train.py", line 251, in main custom_train_model( File "/home/sajadi/anaconda3/envs/streampetr/StreamPETR/projects/mmdet3d_plugin/core/apis/train.py", line 30, in custom_train_model custom_train_detector( File "/home/sajadi/anaconda3/envs/streampetr/StreamPETR/projects/mmdet3d_plugin/core/apis/mmdet_train.py", line 203, in custom_train_detector runner.run(data_loaders, cfg.workflow) File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 126, in run self.call_hook('before_run') File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 317, in call_hook getattr(hook, fn_name)(self) File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/mmcv/runner/dist_utils.py", line 135, in wrapper return func(*args, **kwargs) File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/mmcv/runner/hooks/logger/tensorboard.py", line 47, in before_run from torch.utils.tensorboard import SummaryWriter File "/home/sajadi/anaconda3/envs/streampetr/lib/python3.8/site-packages/torch/utils/tensorboard/init.py", line 4, in LooseVersion = distutils.version.LooseVersion AttributeError: module 'distutils' has no attribute 'version' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 17024) of binary: /home/sajadi/anaconda3/envs/streampetr/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=1 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_3ywdtqmw/none_p36ahq8n/attempt_1/0/error.json projects.mmdet3d_plugin

SMSajadi99 commented 1 year ago

Unfortunately, the process does not end

exiawsh commented 1 year ago

Unfortunately, the process does not end

setuptools version: 45.2.0

SMSajadi99 commented 1 year ago

Is this version correct? Because it gives an error related to the same issue: ModuleNotFoundError: No module named '_distutils_hack' I searched: https://stackoverflow.com/questions/73496322/modulenotfounderror-no-module-named-distutils-hack

exiawsh commented 1 year ago

Is this version correct? Because it gives an error related to the same issue: ModuleNotFoundError: No module named '_distutils_hack' I searched: https://stackoverflow.com/questions/73496322/modulenotfounderror-no-module-named-distutils-hack

It seems that you should uninstall the old version of setuptools. And then install the version of 45.2.0

exiawsh commented 1 year ago

Is this version correct? Because it gives an error related to the same issue: ModuleNotFoundError: No module named '_distutils_hack' I searched: https://stackoverflow.com/questions/73496322/modulenotfounderror-no-module-named-distutils-hack

Hi, I need to sleep now… I will answer you in tomorrow if your issue is not solved.

SMSajadi99 commented 1 year ago

Thank you for your help. Yes, this problem has been solved, but it seems that the CUDA memory is low, which does not perform the refinement: RuntimeError: CUDA out of memory. Tried to allocate 66.00 MiB (GPU 0; 1.95 GiB total capacity; 728.34 MiB already allocated; 27.56 MiB free; 810.00 MiB reserved in total by PyTorch) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 21265) of binary: /home/sajadi/anaconda3/envs/streampetr/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=1 master_addr=127.0.0.1 master_port=29500 group_rank=0 group_world_size=1 local_ranks=[0] role_ranks=[0] global_ranks=[0] role_world_sizes=[1] global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_f12peicb/none_ycf7las6/attempt_1/0/error.json projects.mmdet3d_plugin

SMSajadi99 commented 1 year ago

Hello again @exiawsh I searched for this lack issue but there was no good way that definitely answers, maybe you can guide me

exiawsh commented 1 year ago

Hello again @exiawsh I searched for this lack issue but there was no good way that definitely answers, maybe you can guide me

What's your gpu device and its total memory? I think you should have at least 6G gpu memory per device.

SMSajadi99 commented 1 year ago

https://github.com/exiawsh/StreamPETR/issues/19#issuecomment-1564442516 Does that mean I can't change the batch size or something else so that I can run it?

exiawsh commented 1 year ago

#19 (comment) Does that mean I can't change the batch size or something else so that I can run it?

Try to set the batchsize to 1. Your gpu device is too old...

SMSajadi99 commented 1 year ago

#19 (comment) Does that mean I can't change the batch size or something else so that I can run it?

Try to set the batchsize to 1. Your gpu device is too old...

Exactly. I did this now but it didn't make any difference and it still has the RAM problem. Is it possible for you to give me the folder called work_dirs?

SMSajadi99 commented 1 year ago

image

exiawsh commented 1 year ago

image

Your gpu memory is not enough… Because it's only have 2G gpu memory… I have say thay you should have at least 6G gpu memory… Try google colab instead.

SMSajadi99 commented 1 year ago

image

Your gpu memory is not enough… Because it's only have 2G gpu memory… I have say thay you should have at least 6G gpu memory… Try google colab instead.

Yes that's right.
Unfortunately, I had to work on this system (of course, I checked the club now, there is a bit of trouble to change its defaults), if possible, send me the files of this folder work_dirs so that I can do the evaluation, and if it is successful, this I transfer the process to another system.
Please accept my request

exiawsh commented 1 year ago

image

Your gpu memory is not enough… Because it's only have 2G gpu memory… I have say thay you should have at least 6G gpu memory… Try google colab instead.

Yes that's right. Unfortunately, I had to work on this system (of course, I checked the club now, there is a bit of trouble to change its defaults), if possible, send me the files of this folder work_dirs so that I can do the evaluation, and if it is successful, this I transfer the process to another system. Please accept my request work_dirs is not necessary. Check our provided model checkpoint https://github.com/exiawsh/storage/releases/download/v1.0/stream_petr_vov_flash_800_bs2_seq_24e.pth.

SMSajadi99 commented 1 year ago

Hello again I tried to create this item on Kolb, but I ran into a problem with the flash-attn model:

image

What should I do to solve this problem because there is a problem with training? It gives an error because it is not the model