train error - Githubissues

nuist-xinyu commented 5 years ago

Traceback (most recent call last): File "train.py", line 203, in train(training_dbs, validation_db, args.start_iter) File "train.py", line 138, in train training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(*training) File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train loss_kp = self.network(xs, ys) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, *kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes) File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else [] File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter return scatter_map(inputs) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map return list(zip(map(scatter_map, obj))) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map return list(map(list, zip(map(scatter_map, obj)))) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map return Scatter.apply(target_gpus, chunk_sizes, dim, obj) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams)) RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36) frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals, std::allocator<CUDAStreamInternals*> > > const&) + 0x4e1 (0x7fac77038871 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #1: + 0xc42a0b (0x7fac77040a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #2: + 0x38a5cb (0x7fac767885cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #13: THPFunction_apply(_object*, _object*) + 0x38f (0x7fac76b66a2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

nuist-xinyu commented 5 years ago

@Duankaiwen 谢谢小哥哥

Duankaiwen commented 5 years ago

Can I see your full log?

nuist-xinyu commented 5 years ago

loading all datasets... using 4 threads loading from cache file: cache/coco_trainval2014.pkl No cache file found... loading annotations into memory... Done (t=25.19s) creating index... index created! 118287it [01:18, 1509.02it/s] loading annotations into memory... Done (t=20.50s) creating index... index created! loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=18.08s) creating index... index created! loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=20.29s) creating index... index created! loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=23.57s) creating index... index created! loading from cache file: cache/coco_minival2014.pkl No cache file found... loading annotations into memory... Done (t=1.26s) creating index... index created! 5000it [00:03, 1478.28it/s] loading annotations into memory... Done (t=0.61s) creating index... index created! system config... {'batch_size': 48, 'cache_dir': 'cache', 'chunk_sizes': [6, 6, 6, 6, 6, 6, 6, 6], 'config_dir': 'config', 'data_dir': './data', 'data_rng': <mtrand.RandomState object at 0x7fac46fc3870>, 'dataset': 'MSCOCO', 'decay_rate': 10, 'display': 5, 'learning_rate': 0.00025, 'max_iter': 480000, 'nnet_rng': <mtrand.RandomState object at 0x7fac46fc38b8>, 'opt_algo': 'adam', 'prefetch_size': 6, 'pretrain': None, 'result_dir': 'results', 'sampling_function': 'kp_detection', 'snapshot': 5000, 'snapshot_name': 'CenterNet-104', 'stepsize': 450000, 'test_split': 'testdev', 'train_split': 'trainval', 'val_iter': 500, 'val_split': 'minival', 'weight_decay': False, 'weight_decay_rate': 1e-05, 'weight_decay_type': 'l2'} db config... {'ae_threshold': 0.5, 'border': 128, 'categories': 80, 'data_aug': True, 'gaussian_bump': True, 'gaussian_iou': 0.7, 'gaussian_radius': -1, 'input_size': [511, 511], 'kp_categories': 1, 'lighting': True, 'max_per_image': 100, 'merge_bbox': False, 'nms_algorithm': 'exp_soft_nms', 'nms_kernel': 3, 'nms_threshold': 0.5, 'output_sizes': [[128, 128]], 'rand_color': True, 'rand_crop': True, 'rand_pushes': False, 'rand_samples': False, 'rand_scale_max': 1.4, 'rand_scale_min': 0.6, 'rand_scale_step': 0.1, 'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]), 'special_crop': False, 'test_scales': [1], 'top_k': 70, 'weight_exp': 8} len of db: 118287 start prefetching data... start prefetching data... shuffling indices... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... building model... module_file: models.CenterNet-104 start prefetching data... shuffling indices... total parameters: 210062960 setting learning rate to: 0.00025 training start... 0%| | 0/480000 [00:00<?, ?it/s]Traceback (most recent call last): File "train.py", line 203, in train(training_dbs, validation_db, args.start_iter) File "train.py", line 138, in train training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(*training) File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train loss_kp = self.network(xs, ys) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, *kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes) File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else [] File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter return scatter_map(inputs) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map return list(zip(map(scatter_map, obj))) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map return list(map(list, zip(map(scatter_map, obj)))) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map return Scatter.apply(target_gpus, chunk_sizes, dim, obj) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams)) RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36) frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals, std::allocator<CUDAStreamInternals*> > > const&) + 0x4e1 (0x7fac77038871 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #1: + 0xc42a0b (0x7fac77040a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #2: + 0x38a5cb (0x7fac767885cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #13: THPFunction_apply(_object*, _object*) + 0x38f (0x7fac76b66a2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

Duankaiwen commented 5 years ago

How many GPUs do you have?

nuist-xinyu commented 5 years ago

i have put the val into the training set as you said,but this erro occurred thank you help me @Duankaiwen

nuist-xinyu commented 5 years ago

16G

Duankaiwen commented 5 years ago

How many GPUs, not GPU memory

nuist-xinyu commented 5 years ago

sorrysorry 8g

nuist-xinyu commented 5 years ago

only one ，2070

Duankaiwen commented 5 years ago

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

nuist-xinyu commented 5 years ago

thank you

nuist-xinyu commented 5 years ago

best wish for you i have try it

nuist-xinyu commented 5 years ago

Traceback (most recent call last): File "train.py", line 203, in train(training_dbs, validation_db, args.start_iter) File "train.py", line 138, in train training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(*training) File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train loss_kp = self.network(xs, ys) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, *kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes) File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else [] File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter return scatter_map(inputs) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map return list(zip(map(scatter_map, obj))) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map return list(map(list, zip(*map(scatter_map, obj)))) File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map return Scatter.apply(target_gpus, chunk_sizes, dim, obj) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams)) RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 16, but expected 2) (scatter at torch/csrc/cuda/comm.cpp:135) frame #0: + 0xc42a0b (0x7f94eb489a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #1: + 0x38a5cb (0x7f94eabd15cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #12: THPFunction_apply(_object*, _object*) + 0x38f (0x7f94eafafa2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

nuist-xinyu commented 5 years ago

Ah, I am crazy now. Still not, God, help me.

Duankaiwen commented 5 years ago

Can I see your config/CenterNet-104.json?

nuist-xinyu commented 5 years ago

{ "system": { "dataset": "MSCOCO", "batch_size": 48, "sampling_function": "kp_detection",

    "train_split": "trainval",
    "val_split": "minival",

    "learning_rate": 0.00025,
    "decay_rate": 10,

    "val_iter": 500,

    "opt_algo": "adam",
    "prefetch_size": 6,

    "max_iter": 480000,
    "stepsize": 450000,
    "snapshot": 5000,

    "chunk_sizes": [6,6,6,6,6,6,6,6],

    "data_dir": "./data"
},

"db": {
    "rand_scale_min": 0.6,
    "rand_scale_max": 1.4,
    "rand_scale_step": 0.1,
    "rand_scales": null,

    "rand_crop": true,
    "rand_color": true,

    "border": 128,
    "gaussian_bump": true,

    "input_size": [511, 511],
    "output_sizes": [[128, 128]],

    "test_scales": [1],

    "top_k": 70,
    "categories": 80,
    "kp_categories": 1,
    "ae_threshold": 0.5,
    "nms_threshold": 0.5,

    "max_per_image": 100
}

}

nuist-xinyu commented 5 years ago

I showed you the original, I also modified it.But still can't do you know chinese

Duankaiwen commented 5 years ago

Can I see your own config/CenterNet-104.json?

nuist-xinyu commented 5 years ago

I don't have my own config, this is my own download.

nuist-xinyu commented 5 years ago

This is what I used for training.

Duankaiwen commented 5 years ago

You said you have modified it the config/CenterNet-104.json, and I want to know what does the modified file look like. The log shows that there are some errors in config/CenterNet-104.json. I need to know the detail of config/CenterNet-104.json to help you

nuist-xinyu commented 5 years ago

{ "system": { "dataset": "MSCOCO", "batch_size": 2, "sampling_function": "kp_detection",

    "train_split": "trainval",
    "val_split": "minival",

    "learning_rate": 0.00025,
    "decay_rate": 10,

    "val_iter": 500,

    "opt_algo": "adam",
    "prefetch_size": 6,

    "max_iter": 480000,
    "stepsize": 450000,
    "snapshot": 5000,

    "chunk_sizes": [2,2,2,2,2,2,2,2],

    "data_dir": "./data"
},

"db": {
    "rand_scale_min": 0.6,
    "rand_scale_max": 1.4,
    "rand_scale_step": 0.1,
    "rand_scales": null,

    "rand_crop": true,
    "rand_color": true,

    "border": 128,
    "gaussian_bump": true,

    "input_size": [511, 511],
    "output_sizes": [[128, 128]],

    "test_scales": [1],

    "top_k": 70,
    "categories": 80,
    "kp_categories": 1,
    "ae_threshold": 0.5,
    "nms_threshold": 0.5,

    "max_per_image": 100
}

}

nuist-xinyu commented 5 years ago

This is what I changed after I modified it.

Duankaiwen commented 5 years ago

Modify 'chunk_sizes' to [2], not [2,2,2,2,2,2,2,2]

nuist-xinyu commented 5 years ago

ok ok 谢谢你啊

nuist-xinyu commented 5 years ago

You are really enthusiastic, thank you.

nuist-xinyu commented 5 years ago

loading all datasets... using 4 threads loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=24.58s) creating index... index created! loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=19.16s) creating index... index created! loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=18.12s) creating index... index created! loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=23.61s) creating index... index created! loading from cache file: cache/coco_minival2014.pkl loading annotations into memory... Done (t=0.78s) creating index... index created! system config... {'batch_size': 2, 'cache_dir': 'cache', 'chunk_sizes': [2], 'config_dir': 'config', 'data_dir': './data', 'data_rng': <mtrand.RandomState object at 0x7f75971ee900>, 'dataset': 'MSCOCO', 'decay_rate': 10, 'display': 5, 'learning_rate': 0.00025, 'max_iter': 480000, 'nnet_rng': <mtrand.RandomState object at 0x7f75971ee948>, 'opt_algo': 'adam', 'prefetch_size': 6, 'pretrain': None, 'result_dir': 'results', 'sampling_function': 'kp_detection', 'snapshot': 5000, 'snapshot_name': 'CenterNet-104', 'stepsize': 450000, 'test_split': 'testdev', 'train_split': 'trainval', 'val_iter': 500, 'val_split': 'minival', 'weight_decay': False, 'weight_decay_rate': 1e-05, 'weight_decay_type': 'l2'} db config... {'ae_threshold': 0.5, 'border': 128, 'categories': 80, 'data_aug': True, 'gaussian_bump': True, 'gaussian_iou': 0.7, 'gaussian_radius': -1, 'input_size': [511, 511], 'kp_categories': 1, 'lighting': True, 'max_per_image': 100, 'merge_bbox': False, 'nms_algorithm': 'exp_soft_nms', 'nms_kernel': 3, 'nms_threshold': 0.5, 'output_sizes': [[128, 128]], 'rand_color': True, 'rand_crop': True, 'rand_pushes': False, 'rand_samples': False, 'rand_scale_max': 1.4, 'rand_scale_min': 0.6, 'rand_scale_step': 0.1, 'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]), 'special_crop': False, 'test_scales': [1], 'top_k': 70, 'weight_exp': 8} len of db: 118287 start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... building model... module_file: models.CenterNet-104 start prefetching data... shuffling indices... total parameters: 210062960 setting learning rate to: 0.00025 training start... 0%| | 0/480000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument

Traceback (most recent call last): File "train.py", line 203, in train(training_dbs, validation_db, args.start_iter) File "train.py", line 138, in train training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(training) File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train loss_kp = self.network(xs, ys) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 68, in forward return self.module(inputs[0], kwargs[0]) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 20, in forward preds = self.model(*xs, *kwargs) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 32, in forward return self.module(*xs, kwargs) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 289, in forward return self._train(xs, kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 193, in _train inter = self.pre(image) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/utils.py", line 14, in forward conv = self.conv(x) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward self.padding, self.dilation, self.groups) RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp:663

nuist-xinyu commented 5 years ago

Hello, I follow what you said, but still can't

Duankaiwen commented 5 years ago

what's your version of cuda

nuist-xinyu commented 5 years ago

cuda 10

Duankaiwen commented 5 years ago

The version maybe high, try cuda 8.0 or cuda 9.0. See this: https://github.com/sangwoomo/instagan/issues/4

nuist-xinyu commented 5 years ago

Hello author, I am bothering you again. I really appreciate your help to me yesterday. I changed cuda to 9, but still can't train, but I can test it.

nuist-xinyu commented 5 years ago

loading all datasets... using 4 threads loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=22.31s) creating index... index created! loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=19.15s) creating index... index created! loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=18.04s) creating index... index created! loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=23.53s) creating index... index created! loading from cache file: cache/coco_minival2014.pkl loading annotations into memory... Done (t=0.78s) creating index... index created! system config... {'batch_size': 1, 'cache_dir': 'cache', 'chunk_sizes': [1], 'config_dir': 'config', 'data_dir': './data', 'data_rng': <mtrand.RandomState object at 0x7fd20efd0870>, 'dataset': 'MSCOCO', 'decay_rate': 10, 'display': 5, 'learning_rate': 0.00025, 'max_iter': 480000, 'nnet_rng': <mtrand.RandomState object at 0x7fd20efd08b8>, 'opt_algo': 'adam', 'prefetch_size': 6, 'pretrain': None, 'result_dir': 'results', 'sampling_function': 'kp_detection', 'snapshot': 5000, 'snapshot_name': 'CenterNet-104', 'stepsize': 450000, 'test_split': 'testdev', 'train_split': 'trainval', 'val_iter': 500, 'val_split': 'minival', 'weight_decay': False, 'weight_decay_rate': 1e-05, 'weight_decay_type': 'l2'} db config... {'ae_threshold': 0.5, 'border': 128, 'categories': 80, 'data_aug': True, 'gaussian_bump': True, 'gaussian_iou': 0.7, 'gaussian_radius': -1, 'input_size': [511, 511], 'kp_categories': 1, 'lighting': True, 'max_per_image': 100, 'merge_bbox': False, 'nms_algorithm': 'exp_soft_nms', 'nms_kernel': 3, 'nms_threshold': 0.5, 'output_sizes': [[128, 128]], 'rand_color': True, 'rand_crop': True, 'rand_pushes': False, 'rand_samples': False, 'rand_scale_max': 1.4, 'rand_scale_min': 0.6, 'rand_scale_step': 0.1, 'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]), 'special_crop': False, 'test_scales': [1], 'top_k': 70, 'weight_exp': 8} len of db: 118287 start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... building model... module_file: models.CenterNet-104 start prefetching data... shuffling indices... total parameters: 210062960 setting learning rate to: 0.00025 training start... 0%| | 0/480000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument

Traceback (most recent call last): File "train.py", line 203, in train(training_dbs, validation_db, args.start_iter) File "train.py", line 138, in train training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(training) File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train loss_kp = self.network(xs, ys) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 68, in forward return self.module(inputs[0], kwargs[0]) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 20, in forward preds = self.model(*xs, *kwargs) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 32, in forward return self.module(*xs, kwargs) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 289, in forward return self._train(xs, kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 193, in _train inter = self.pre(image) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, kwargs) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward input = module(input) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, *kwargs) File "/home/xinyu/CenterNet-master/models/py_utils/utils.py", line 14, in forward conv = self.conv(x) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, kwargs) File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward self.padding, self.dilation, self.groups) RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp:663

nuist-xinyu commented 5 years ago

loading parameters at iteration: 480000 building neural network... module_file: models.CenterNet-104 total parameters: 210062960 loading parameters... loading model from cache/nnet/CenterNet-104/CenterNet-104_480000.pkl locating kps: 0%| | 0/5000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.") locating kps: 72%|██████████████████ | 3624/5000 [34:05<14:17, 1.60it/

this is test

Duankaiwen commented 5 years ago

How about now？

nuist-xinyu commented 5 years ago

RuntimeError: Expected object of type CUDAByteType but found type CUDAFloatType for argument #0 'result' (checked_cast_tensor at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/ATen/Utils.h:30) frame #0: + 0xf5cb33 (0x7f961bb32b33 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so) frame #1: at::CUDAFloatType::s_gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x26 (0x7f961bb361c6 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so) frame #2: torch::autograd::VariableType::s_gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x19c (0x7f96351b7f2c in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #3: at::Type::gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x118 (0x7f961bc8fdd8 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so) frame #4: pool_backward(at::Tensor, at::Tensor) + 0x435 (0x7f95ed4c3315 in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so) frame #5: + 0x112cb (0x7f95ed4cd2cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so) frame #6: + 0x1149e (0x7f95ed4cd49e in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so) frame #7: + 0x11e1c (0x7f95ed4cde1c in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so) frame #8: _PyCFunction_FastCallDict + 0x154 (0x555e9c1c2744 in python) frame #9: + 0x19842c (0x555e9c24942c in python) frame #10: _PyEval_EvalFrameDefault + 0x30a (0x555e9c26e38a in python) frame #11: PyEval_EvalCodeEx + 0x329 (0x555e9c244289 in python) frame #12: + 0x194094 (0x555e9c245094 in python) frame #13: PyObject_Call + 0x3e (0x555e9c1c254e in python) frame #14: _PyEval_EvalFrameDefault + 0x19ec (0x555e9c26fa6c in python) frame #15: + 0x1918e4 (0x555e9c2428e4 in python) frame #16: _PyFunction_FastCallDict + 0x1bc (0x555e9c243c4c in python) frame #17: _PyObject_FastCallDict + 0x26f (0x555e9c1c2b0f in python) frame #18: _PyObject_Call_Prepend + 0x63 (0x555e9c1c76a3 in python) frame #19: PyObject_Call + 0x3e (0x555e9c1c254e in python) frame #20: torch::autograd::PyFunction::apply(std::vector<torch::autograd::Variable, std::allocator > const&) + 0x199 (0x7f9635197579 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #21: torch::autograd::Engine::evaluate_function(torch::autograd::FunctionTask&) + 0x1d1e (0x7f963518254e in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #22: torch::autograd::Engine::thread_main(torch::autograd::GraphTask*) + 0xe7 (0x7f9635182f17 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #23: torch::autograd::Engine::thread_init(int) + 0x72 (0x7f963517f822 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #24: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7f96351ad8aa in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #25: + 0xb8678 (0x7f96187ea678 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6) frame #26: + 0x76ba (0x7f96454606ba in /lib/x86_64-linux-gnu/libpthread.so.0) frame #27: clone + 0x6d (0x7f964519641d in /lib/x86_64-linux-gnu/libc.so.6)

This error occurs and still cannot run

hheavenknowss commented 5 years ago

Hello,I have the same issue,and I have two gpus,do I need to change the config like above? if I need to,please tell me how. I have been troubled for 2 weeks,I'll be vrey appreciate if I can fix it

Duankaiwen commented 5 years ago

please show your log

hheavenknowss commented 5 years ago

please show your log

Thank you for replying this fast ,but I have some wrong with my environment suddenly，I‘ll put it later,thanks again

hheavenknowss commented 5 years ago

please show your log

Thank you for replying this fast ,but I have some wrong with my environment suddenly，I‘ll put it later,thanks again

loading all datasets... using 4 threads loading from cache file: cache/coco_hepaticvessel_001.pkl No cache file found... loading annotations into memory... Done (t=0.00s) creating index... index created! 49it [00:00, 36524.06it/s] loading annotations into memory... Done (t=0.00s) creating index... index created! loading from cache file: cache/coco_hepaticvessel_001.pkl loading annotations into memory... Done (t=0.00s) creating index... index created! loading from cache file: cache/coco_hepaticvessel_001.pkl loading annotations into memory... Done (t=0.00s) creating index... index created! loading from cache file: cache/coco_hepaticvessel_001.pkl loading annotations into memory... Done (t=0.00s) creating index... index created! loading from cache file: cache/coco_hepaticvessel_001.pkl loading annotations into memory... Done (t=0.00s) creating index... index created! system config... {'batch_size': 48, 'cache_dir': 'cache', 'chunk_sizes': [6, 6, 6, 6, 6, 6, 6, 6], 'config_dir': 'config', 'data_dir': './data', 'data_rng': <mtrand.RandomState object at 0x7f5fd2cc5ab0>, 'dataset': 'MSCOCO', 'decay_rate': 10, 'display': 5, 'learning_rate': 0.00025, 'max_iter': 480000, 'nnet_rng': <mtrand.RandomState object at 0x7f5fd2cc5af8>, 'opt_algo': 'adam', 'prefetch_size': 6, 'pretrain': None, 'result_dir': 'results', 'sampling_function': 'kp_detection', 'snapshot': 5000, 'snapshot_name': 'CenterNet-104', 'stepsize': 450000, 'test_split': 'testdev', 'train_split': 'trainval', 'val_iter': 500, 'val_split': 'minival', 'weight_decay': False, 'weight_decay_rate': 1e-05, 'weight_decay_type': 'l2'} db config... {'ae_threshold': 0.5, 'border': 128, 'categories': 80, 'data_aug': True, 'gaussian_bump': True, 'gaussian_iou': 0.7, 'gaussian_radius': -1, 'input_size': [512, 512], 'kp_categories': 1, 'lighting': True, 'max_per_image': 100, 'merge_bbox': False, 'nms_algorithm': 'exp_soft_nms', 'nms_kernel': 3, 'nms_threshold': 0.5, 'output_sizes': [[128, 128]], 'rand_color': True, 'rand_crop': True, 'rand_pushes': False, 'rand_samples': False, 'rand_scale_max': 1.4, 'rand_scale_min': 0.6, 'rand_scale_step': 0.1, 'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]), 'special_crop': False, 'test_scales': [1], 'top_k': 70, 'weight_exp': 8} len of db: 49 start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... start prefetching data... shuffling indices... building model... module_file: models.CenterNet-104 start prefetching data... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... shuffling indices... total parameters: 210062960 setting learning rate to: 0.00025 training start... 0%| | 0/480000 [00:00<?, ?it/s]shuffling indices... shuffling indices...

Traceback (most recent call last): File "train.py", line 203, in train(training_dbs, validation_db, args.start_iter) File "train.py", line 138, in train training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(*training) File "/dfsdata/pengxf2_data/CenterNet-master/nnet/py_factory.py", line 82, in train loss_kp = self.network(xs, ys) File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(input, *kwargs) File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes) File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes) File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else [] File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter return scatter_map(inputs) File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map return list(zip(map(scatter_map, obj))) File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map return list(map(list, zip(*map(scatter_map, obj)))) File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map return Scatter.apply(target_gpus, chunk_sizes, dim, obj) File "/usr/local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams) File "/usr/local/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams)) RuntimeError: Device index must be -1 or non-negative, got -1687419088 (Device at /pytorch/torch/lib/tmp_install/include/ATen/Device.h:47) frame #0: + 0xc4964b (0x7f5f6517b64b in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) frame #1: + 0x39120b (0x7f5f648c320b in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #12: THPFunction_apply(_object*, _object*) + 0x38f (0x7f5f64ca166f in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so) Exception in thread Thread-4: Traceback (most recent call last): File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/local/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "train.py", line 51, in pin_memory data = data_queue.get() File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd fd = df.detach() File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 493, in Client answer_challenge(c, authkey) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge message = connection.recv_bytes(256) # reject large message File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError Exception in thread Thread-3: Traceback (most recent call last): File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/local/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "train.py", line 51, in pin_memory data = data_queue.get() File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd fd = df.detach() File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 493, in Client answer_challenge(c, authkey) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge message = connection.recv_bytes(256) # reject large message File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer

Duankaiwen commented 5 years ago

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

hheavenknowss commented 5 years ago

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

I'll try it thank you , and I've tried batch_size 8 and chunk_size [4,4] it worked, I wonder if most these issues are about batch_size and chun_size set?

Duankaiwen commented 5 years ago

Yes

hheavenknowss commented 5 years ago

Yes

Thank you for your answer and patience

Duankaiwen commented 5 years ago

No problem

nuist-xinyu commented 5 years ago

loading all datasets... using 4 threads loading from cache file: cache/coco_trainval2014.pkl loading annotations into memory... Done (t=2406.72s) creating index... index created! Traceback (most recent call last): File "train.py", line 193, in training_dbs = [datasets[dataset](configs["db"], trainsplit) for in range(threads)] File "train.py", line 193, in training_dbs = [datasets[dataset](configs["db"], trainsplit) for in range(threads)] File "/home/zq/辛宇/CenterNet-master/db/coco.py", line 69, in init self._load_coco_data() File "/home/zq/辛宇/CenterNet-master/db/coco.py", line 85, in _load_coco_data data = json.load(f) File "/home/zq/anaconda3/envs/CenterNet/lib/python3.6/json/init.py", line 296, in load return loads(fp.read(), File "/home/zq/anaconda3/envs/CenterNet/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) MemoryError

i'm sorry ,Disturb you again，when i run this code in 1080 （cuda9 and torch0.41），This happens. How can I solve it?

Duankaiwen commented 5 years ago

Try this: cd /data/coco/PythonAPI make

nuist-xinyu commented 5 years ago

Thank you for answering my question late at night. I did what you said, but this happened. python setup.py build_ext --inplace running build_ext skipping 'pycocotools/_mask.c' Cython extension (up-to-date) building 'pycocotools._mask' extension creating build creating build/common creating build/temp.linux-x86_64-3.6 creating build/temp.linux-x86_64-3.6/pycocotools gcc -pthread -B /home/zq/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/zq/anaconda3/lib/python3.6/site-packages/numpy/core/include -I../common -I/home/zq/anaconda3/include/python3.6m -c ../common/maskApi.c -o build/temp.linux-x86_64-3.6/../common/maskApi.o -Wno-cpp -Wno-unused-function -std=c99 ../common/maskApi.c: In function ‘rleToBbox’: ../common/maskApi.c:141:31: warning: ‘xp’ may be used uninitialized in this function [-Wmaybe-uninitialized] if(j%2==0) xp=x; else if(xp<x) { ys=0; ye=h-1; } ^ gcc -pthread -B /home/zq/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/zq/anaconda3/lib/python3.6/site-packages/numpy/core/include -I../common -I/home/zq/anaconda3/include/python3.6m -c pycocotools/_mask.c -o build/temp.linux-x86_64-3.6/pycocotools/_mask.o -Wno-cpp -Wno-unused-function -std=c99 creating build/lib.linux-x86_64-3.6 creating build/lib.linux-x86_64-3.6/pycocotools gcc -pthread -shared -B /home/zq/anaconda3/compiler_compat -L/home/zq/anaconda3/lib -Wl,-rpath=/home/zq/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/../common/maskApi.o build/temp.linux-x86_64-3.6/pycocotools/_mask.o -o build/lib.linux-x86_64-3.6/pycocotools/_mask.cpython-36m-x86_64-linux-gnu.so copying build/lib.linux-x86_64-3.6/pycocotools/_mask.cpython-36m-x86_64-linux-gnu.so -> pycocotools rm -rf build

nuist-xinyu commented 5 years ago

This error occurred when I ran the program.

zq@zq-G1-SNIPER-B7:~/辛宇/CenterNet-master$ python train.py CornerNet Traceback (most recent call last): File "train.py", line 18, in from nnet.py_factory import NetworkFactory File "/home/zq/辛宇/CenterNet-master/nnet/py_factory.py", line 8, in from models.py_utils.data_parallel import DataParallel File "/home/zq/辛宇/CenterNet-master/models/py_utils/init.py", line 6, in from ._cpools import TopPool, BottomPool, LeftPool, RightPool File "/home/zq/辛宇/CenterNet-master/models/py_utils/_cpools/init.py", line 8, in import top_pool, bottom_pool, left_pool, right_pool ImportError: /home/zq/.local/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/top_pool.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN5torch4barfEPKcz

Duankaiwen commented 5 years ago

what's your torch version?

Duankaiwen / CenterNet

train error #46