train error - Githubissues

nuist-xinyu commented 5 years ago

Traceback (most recent call last): File "train.py", line 203, in train(training_dbs, validation_db, args.start_iter) File "train.py", line 138, in train training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(*training) File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train loss_kp = self.network(xs, ys) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, *kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes) File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else [] File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter return scatter_map(inputs) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map return list(zip(map(scatter_map, obj))) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map return list(map(list, zip(map(scatter_map, obj)))) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map return Scatter.apply(target_gpus, chunk_sizes, dim, obj) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams)) RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36) frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals, std::allocator<CUDAStreamInternals*> > > const&) + 0x4e1 (0x7fac77038871 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #1: + 0xc42a0b (0x7fac77040a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #2: + 0x38a5cb (0x7fac767885cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #13: THPFunction_apply(_object*, _object*) + 0x38f (0x7fac76b66a2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

nuist-xinyu commented 5 years ago

in this computer

import torch print(torch.version) 0.5.0a0+ce8e8fe

lolongcovas commented 5 years ago

please, check here

Orange-Ocean-hh commented 4 years ago

Traceback (most recent call last): File "train.py", line 43, in prefetch_data data, ind = sample_data(db, ind, data_aug=data_aug) File "/data/data/fxy/CenterNet-master/sample/coco.py", line 199, in sample_data return globals()[system_configs.sampling_function](db, k_ind, data_aug, debug) File "/data/data/fxy/CenterNet-master/sample/coco.py", line 99, in kp_detection image, detections = random_crop(image, detections, rand_scales, input_size, border=border) File "/data/data/fxy/CenterNet-master/sample/utils.py", line 57, in random_crop image_height, image_width = image.shape[0:2] AttributeError: 'NoneType' object has no attribute 'shape' Hello,sorry to disturb.How can I fix the error?I'm not sure if there is any problem about 'shape'.

WuChannn commented 4 years ago

@Duankaiwen hello, kaiwen, could you please show where to specify the ids of gpu used? or the code will use all the gpus automatically? thank you

Duankaiwen commented 4 years ago

@WuChannn Specifying the gpu ids is not supported, but you can specify the 'chunk_sizes' and the 'batch_size' in config/CenterNet-xxx.json, where the length of 'chunk_sizes' denotes the number of gpus you will use, the item in 'chunk_sizes' denote the batch size for each gpu. And the sum(chunk_sizes) should be equal to the 'batch_size'

WuChannn commented 4 years ago

Ok, get it. Thanks ------------------ Original ------------------From: Kaiwen Duan notifications@github.comDate: Tue,Aug 11,2020 10:10 PMTo: Duankaiwen/CenterNet CenterNet@noreply.github.comCc: WuChannn wuzchng@gmail.com, Mention mention@noreply.github.comSubject: Re: [Duankaiwen/CenterNet] train error (#46) @WuChannn Specifying the gpu ids is not supported, but you can specify the 'chunk_sizes' and the 'batch_size' in config/CenterNet-xxx.json, where the length of 'chunk_sizes' denotes the number of gpus you will use, the item in 'chunk_sizes' denote the batch size for each gpu. And the sum(chunk_sizes) should be equal to the 'batch_size'

—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/Duankaiwen/CenterNet/issues/46#issuecomment-671970075", "url": "https://github.com/Duankaiwen/CenterNet/issues/46#issuecomment-671970075", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Duankaiwen / CenterNet

train error #46