C++ generate wrong prediction results on GPU

jacksonxliu commented 5 years ago

Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.

For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io

Description

We used gluoncv to export a trained mask rcnn model, we got same results when we tested the same model with same image and parameters using Python on CPU/GPU, we also got the same results when tested with C++ on CPU, but we got different/wrong results when tested with C++ on GPU.

Environment info (Required)

Ubuntu 16.04 CUDA 10.0 Python 2.7 CUDNN7.4.15 MXNET built from source

Package used (Python/R/Scala/Julia): I'm using C++ and python.

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio): cmake

Error Message:

################# OUTPUT ###################### cpu: 0.996915 0.993642 0.990671 0.978703 0.91161 0.778514 0.692584 0.633775 0.631869 0.562181 0.545941 0.476742 0.4574 0.432584 0.396263 0.317217 0.296218 0.28443 0.267405 0.251783 0.237946 0.224283 0.198709 0.189072 0.153972 0.151074 0.148971 0.142513 0.14031 0.134026 0.127362 0.122279 0.109635 0.092489 0.0904559 0.0865852 0.085733 0.0724328 0.0690011 0.0636383 0.061695 0.0616221 0.0596107 0.0569491 0.0563335 0.0561303 0.0510247 0.050559 0.0437778 0.0419557 0.0416868 0.0405976 0.0399518 0.0392632 0.0362104 0.0353378 0.0328305 0.0317959 0.0312488 0.0310682 0.0302153 0.0297335 0.0292187 0.0263911 0.023408 0.0220337 0.0217322 0.0208389 0.0201924 0.0201307 0.0198602 0.018957 0.0189168 0.0179614 0.0176828 0.0176195 0.0166058 0.0164065 0.0163603 0.0155169 0.0153696 0.015359 0.0137562 0.0135295 0.0135017 0.0134879 0.013319 0.0126309 0.0124735 0.0123156 0.0119515 0.0116736 0.0109212 0.0107892 0.0107493 0.0106249 0.0105719 0.0105638 0.0101695 0.0100995 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...

gpu: 1.23133e-36 0 1.18027e-36 0 0 0 0 0 31 34 34 29 33 29 24 28 28 23 27 27 23 27 27 24 27 25 23 26 22 20 1.23287e-38 0 17 19 18 16 19 17 15 18 15 13 15 14 12 13 11 9 109 8 8 8 8 6 7 7 5 6 6 4 5 5 3 4 4 2 4 4 2 4 4 2 4 4 2 4 4 2 4 4 2 4 4 2 3 2 1 3 2 1 3 2 1 3 2 0 2 1 0 2 0 1 2 0 1 2 0 1 1 1 1 1 1 1 1 1 1 1 0 1 2 0 1 2 0 1 2 0 1 2 01 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 2 0 1 2 0 2 2 0 3 2 0 3 2 0 3 1 0 1 1 1 2 2 0 1 1 0 0 4 2 4 6 5 7 9 910 10 10 11 9 10 11 12 12 13 13 14 14 13 14 13 13 13 11 13 13 11 13 13 11 13 13 11 14 14 13 13 13 13 13 13 13 14 15 15 15 17 16 15 18 16 16 18 17 18 20 19 14 16 15 911 10 8 10 9 8 10 9 8 10 9 8 10 9 9 11 10 10 12 11 12 14 13 129 117 100 135 124 107 138 126 109 140 127 111 140 127 111 139 127 110 140 128 112 145 134 117 145 136 118 144 137 118 145 138 121 146 139 123 148 140 125 146 139 123 146 139 122 147 140 122 149 140 123 152 141 122 152 141 123 152 141 123 152 142 124 151 143 123 150 142123 150 142 123 150 142 123 153 143 123 154 144 124 155 145 125 155 146 125 156 146 126 155 146 125 156 147 126 157 147 127 156 148 126 158 148 127 158 148 127 158 149 128 158 149 128 158 150 129 157 150 129 156 150 129 154 148 128...

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.) ################### CODE ################## auto ids = exec->outputs[0].Copy(Context(kCPU, 0)); auto scores = exec->outputs[1].Copy(Context(kCPU, 0)); auto bboxes = exec->outputs[2].Copy(Context(kCPU, 0)); auto masks = exec->outputs[3].Copy(Context(kCPU, 0));

NDArray::WaitAll();

const float pred_ids = ids.GetData(); const float pred_scores = scores.GetData(); const float pred_bboxes = bboxes.GetData(); const float pred_masks = masks.GetData();

NDArray::WaitAll();

for(int i=0; i<1000; i++) { float tmp = *(pred_scores+i); std::cout<< tmp<<" "; }

mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: C++, Test

zachgk commented 5 years ago

Thank you for submitting the issue! I'm labeling it so the MXNet community members can help resolve it.

@mxnet-label-bot add [Bug, C++]

stackmystack commented 5 years ago

I am facing the same issue right now, but my case is a bit different.

I am training using the c++ API on the GPU. When I inspected the output values of the first batches, I can clearly see that the output is incorrect: negative prediction values after a relu.

After a lot of printing of input and output, just to make sure I am not printing or editing the labels, I concluded that the problem is most probably from the C++ API, which brought me here.

I switched to CPU, to check if I can reproduce the bug. And I did. Output from CPU is correct, i.e. positive.

I stumbled upon this issue while investigating what I thought to be a differrent issue: when I print the training labels before and after a forward pass I get an occasional mismatch. I will get a single batch item have one of its labels flip from 0 to 1.

leleamol commented 5 years ago

Hello @jacksonxliu I could not reproduce this problem. I tried outputting the results from "inception_inference" example in CPU and GPU context but the output was same. Can you please confirm that the above code when ran as it is on CPU and GPU instances outputs a different result?

I noticed that the even though we can get the copy of output in GPU context as follows:

auto ids = exec->outputs[0].Copy(Context(kGPU, 0));

We wouldn't be able to access the individual members of "ids" NDArray directly. This is because the underlying "GetData()" function requires the NDArray to be in CPU context. The overloaded operator "<<" of NDArray converts the array in GPU context to CPU context before accessing the individual members.

In addition, I noticed that when printing or accessing the NDArray elements, Python uses "MXNDArraySyncCopyToCPU" API where as C++ implementation uses "_copy" operator.

Will it possible for you to try following code to see if it produces different results on CPU and GPU?

std::vector<mx_float> ids;
exec->outputs[0].SyncCopyToCPU(&ids, 0L);

for (size_t i = 0; i < ids.size(); i++) {
   std::cout << ids[i];
}

Thanks.

leleamol commented 5 years ago

@mxnet-label-bot add [Pending Requester Info]

piyushghai commented 5 years ago

@jacksonxliu Did @leleamol 's suggestions help you out ? If yes, please feel free to close the issue.

lanking520 commented 5 years ago

@jacksonxliu Close the issue for now. Please feel free to reopen if the problem persist.

apache / mxnet