Open blancaag opened 6 years ago
@blancaag I have met the same problem? How did you fix it?
@mangdian I mention it above. Get's fixed applying nn.BCEWithLogitsLoss() instead of nn.BCELoss() in networks.py line 82 --it restricts loss values between 0 and 1 before applying the loss.
I think I'm having the same issue but only when I use my own dataset. I've tried nn.BCEWithLogitsLoss() but with no luck. It must be related to my data but I can't figure out what I must be missing.
RuntimeError: CUDNN_STATUS_INTERNAL_ERROR /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [346,0,0] Assertion
indexValue >= 0 && indexValue < tensor.sizes[dim]failed. /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [347,0,0] Assertion
indexValue >= 0 && indexValue < tensor.sizes[dim]failed. /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [348,0,0] Assertion
indexValue >= 0 && indexValue < tensor.sizes[dim]failed. /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [349,0,0] Assertion
indexValue >= 0 && indexValue < tensor.sizes[dim]failed. /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [350,0,0] Assertion
indexValue >= 0 && indexValue < tensor.sizes[dim]failed. /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/THCTensorScatterGather.cu:176: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [88,0,0], thread: [351,0,0] Assertion
indexValue >= 0 && indexValue < tensor.sizes[dim]failed. THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/generic/THCStorage.c line=184 error=59 : device-side assert triggered terminate called after throwing an instance of 'std::runtime_error' what(): cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1525812548180/work/aten/src/THC/generic/THCStorage.c:184 Aborted (core dumped)
@aviel08 - I think it's a different error and not in the BCELoss - "AssertionindexValue >= 0 && indexValue < tensor.sizes[dim]". I'd suggest to start printing the shape of the input tensors after this line: https://github.com/NVIDIA/pix2pixHD/blob/20687df85d30e6fff5aafb29b7981923da9fd02f/train.py#L51
On 15 Aug 2018, at 08:37, Alex Leiva notifications@github.com wrote:
ndexValue < tensor.sizes[dim]failed. /opt/conda/conda-
@aviel08 I met the same problem,how did you solve it
You can use torch.clamp(0,1) after your sigmoid layer
I had to also add:
x = torch.where(torch.isnan(x), torch.zeros_like(x), x)
x = torch.where(torch.isinf(x), torch.zeros_like(x), x)
I have applyed nn.BCEWithLogitsLoss() instead of BECLoss(),solve it
I find that @relh's solution is effective.
>>> torch.nn.functional.sigmoid(torch.tensor(float('nan')))
tensor(nan)
x = torch.where(torch.isnan(x), torch.zeros_like(x), x)
prevents this error. Thanks a lot!
A CUDA assertion error pops up when setting --no_lsgan. It seems it's because there are negative values thrown into the nn.BCELoss(). Get's fixed applying nn.BCEWithLogitsLoss() instead.