HobbitLong / CMC

[arXiv 2019] "Contrastive Multiview Coding", also contains implementations for MoCo and InstDis
BSD 2-Clause "Simplified" License
1.3k stars 179 forks source link

Something went wrong when evaluating the results on ImageNet #32

Open ChongjianGE opened 4 years ago

ChongjianGE commented 4 years ago

When I evaluated the result on ImageNet (not the subset), I got the bug as follows:

THCudaCheckWarn FAIL file=/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/THCStream.cpp line=50 error=59 : device-side assert triggered

Does anyone have any thought about the issue?

xuChenSJTU commented 4 years ago

@ChongjianGE I have a similar error:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579022022457/work/aten/src/THC/THCGeneral.cpp line=141 error=60 : peer mapping resources exhausted Traceback (most recent call last): File "/DATA7_DB7/data/xchen/Multi-view-Clustering-master/CMC-master/LinearProbing.py", line 483, in main() File "/DATA7_DB7/data/xchen/Multi-view-Clustering-master/CMC-master/LinearProbing.py", line 432, in main train_acc, train_acc5, train_loss = train(epoch, train_loader, model, classifier, criterion, optimizer, args) File "/DATA7_DB7/data/xchen/Multi-view-Clustering-master/CMC-master/LinearProbing.py", line 277, in train res = model(input, opt.layer) File "/DB/rhome/xchen/anaconda2/envs/Conda_python3_5/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/DATA7_DB7/data/xchen/Multi-view-Clustering-master/CMC-master/models/alexnet.py", line 45, in forward return self.encoder(x, layer) File "/DB/rhome/xchen/anaconda2/envs/Conda_python3_5/lib/python3.5/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, *kwargs) File "/DB/rhome/xchen/anaconda2/envs/Conda_python3_5/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 148, in forward inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids) File "/DB/rhome/xchen/anaconda2/envs/Conda_python3_5/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 159, in scatter return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim) File "/DB/rhome/xchen/anaconda2/envs/Conda_python3_5/lib/python3.5/site-packages/torch/nn/parallel/scatter_gather.py", line 36, in scatter_kwargs inputs = scatter(inputs, target_gpus, dim) if inputs else [] File "/DB/rhome/xchen/anaconda2/envs/Conda_python3_5/lib/python3.5/site-packages/torch/nn/parallel/scatter_gather.py", line 28, in scatter res = scatter_map(inputs) File "/DB/rhome/xchen/anaconda2/envs/Conda_python3_5/lib/python3.5/site-packages/torch/nn/parallel/scatter_gather.py", line 15, in scatter_map return list(zip(map(scatter_map, obj))) File "/DB/rhome/xchen/anaconda2/envs/Conda_python3_5/lib/python3.5/site-packages/torch/nn/parallel/scatter_gather.py", line 13, in scatter_map return Scatter.apply(target_gpus, None, dim, obj) File "/DB/rhome/xchen/anaconda2/envs/Conda_python3_5/lib/python3.5/site-packages/torch/nn/parallel/_functions.py", line 89, in forward outputs = comm.scatter(input, target_gpus, chunk_sizes, ctx.dim, streams) File "/DB/rhome/xchen/anaconda2/envs/Conda_python3_5/lib/python3.5/site-packages/torch/cuda/comm.py", line 147, in scatter return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams)) RuntimeError: cuda runtime error (60) : peer mapping resources exhausted at /opt/conda/conda-bld/pytorch_1579022022457/work/aten/src/THC/THCGeneral.cpp:141

Process finished with exit code 1

Have you solved it yet?