python tools/train.py config/nanodet-plus-m_320.yml

molyswu commented 1 year ago

Tried to : python tools/train.py config/nanodet-plus-m_320.yml error: pytorch_lightning.utilities.cloud_io.get_filesystem has been deprecated in v1.8.0 and will be" [NanoDet][01-04 10:28:00]INFO:Setting up data... loading annotations into memory... Done (t=18.55s) creating index... index created! loading annotations into memory... Done (t=0.56s) creating index... index created! [NanoDet][01-04 10:28:21]INFO:Creating model... model size is 1.0x init weights... => loading pretrained model https://download.pytorch.org/models/shufflenetv2_x1-5666bf0f80.pth Finish initialize NanoDet-Plus Head. GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /root/anaconda3/envs/nanodet/lib/python3.7/site-packages/torch/cuda/init.py:143: UserWarning: NVIDIA GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]

| Name | Type | Params

0 | model | NanoDetPlus | 4.3 M 1 | avg_model | NanoDetPlus | 4.3 M

8.7 M Trainable params 0 Non-trainable params 8.7 M Total params 34.647 Total estimated model params size (MB) [NanoDet][01-04 10:28:21]INFO:Weight Averaging is enabled /root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:229: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 40 which is the number of cpus on this machine) in theDataLoader` init to improve performance. category=PossibleUserWarning, Traceback (most recent call last): File "tools/train.py", line 146, in main(args) File "tools/train.py", line 141, in main trainer.fit(task, train_dataloader, val_dataloader) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 604, in fit self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt return trainer_fn(*args, *kwargs) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 645, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1098, in _run results = self._run_stage() File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1177, in _run_stage self._run_train() File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1200, in _run_train self.fit_loop.run() File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/loops/loop.py", line 194, in run self.on_run_start(args, **kwargs) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 206, in on_run_start self.trainer.reset_train_dataloader(self.trainer.lightning_module) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1552, in reset_train_dataloader if has_len_all_ranks(self.train_dataloader, self.strategy, module) File "/root/anaconda3/envs/nanodet/lib/python3.7/site-packages/pytorch_lightning/utilities/data.py", line 110, in has_len_all_ranks if total_length == 0: RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

python3.7 cuda==10.2 gpu==RT3090 UBUNTU20.04

Thanks

deshpandeneeraj commented 1 year ago

Most likely you have nvidia drivers missing on your device, maybe try reinstalling them?

kravcc commented 1 year ago

Did you solve it? @molyswu I have the same problem now...

RangiLyu / nanodet

python tools/train.py config/nanodet-plus-m_320.yml #485

| Name | Type | Params

0 | model | NanoDetPlus | 4.3 M 1 | avg_model | NanoDetPlus | 4.3 M