Open honamida opened 7 years ago
Did you train on CPU or GPU?
I tried to run training and testing code on GPU, but failed
Ah, sorry. I missed that you are using the C++ caffe2::Predictor
caffe::Predictor
does not work with a CUDA tensor as input since it uses a CPU tensor internally. I tried a workaround by implementing a CUDA version of the predictor but ran into other issues (see #694).
CUDA prediction in C++ seems tu be surprisingly difficult :)
cc/ @salexspb - maybe this is something we can abstract out in the predictor interface?
Hi, This make me wonder if the tutorial which ran net by predictor is based on CUDA or not. BTW, how to solve the problems on Python API? Do I need to add some operators like copyCPUtoGPU or something? The message tells that it got the CPUContext.
Another question is that does this means we can solve this problem by changing to workspace interface, giving the data by FeedBlob? If so, is there something I have to keep attention?
Thanks!
In python training and prediction on CPU it worked for me (https://github.com/peterneher/peters-stuff/blob/master/Caffe2Scripts/classification_no_db_example.py). Unfortunately training on CPU and then testing on CPU did not.
Hey guys. As of right now I actually recommend using raw nets API from C++. This predictor does not do much on top of it in any way. We will think about a better story here. Still nets API should be also good I think. And then in the net you need to specify in operators proto which device it runs on.
Let me know if it works for you! We will keep you tuned on newer APIs.
Hi @salexspb Do you have sample code for GPU C++ prediction using the nets API?
You basically could use workspace->RunNet. Being GPU / Non GPU is set by a device_option of each operator. When you construct your net in python , it is easy to control by using DeviceScope. You could change it manually afterwards in either python or c++ (this is just a protobuf). Another way is to have a function which creates your net. Then you could call it with one device scope, train a model. Then call it again in another device scope and get a model which has same weights but executes on a different device. You will have to take care of weights being on corresponding device as well.
now is: 2018/7/19, is there any solutions?
@honamida ,Hi, I am sorry to bother you. but is there any solutions to this question? I have tried workspace::RunNet, and add a "CopyCPUToGPU" operator to copy the input cv::Mat to TensorCUDA, but the error occurs the same: cannot create operator of type conv on the device cuda..
@Yangqing ,打扰了。这个issue开了好久了。不知道现在有没有解决方案?如果C++不能愉快的(合理的,方便的)进行前向预测,那caffe2所谓的比其他框架更方便于工程化就无从谈起,毕竟稍微大一点的模型都是需要使用GPU的。 所以,恳请解决这个问题。 谢谢。
@beichen2012 You can use lower interface for gpu prediction. That caffe2::Predictor
interface is just an example for cpu prediction.
@ezineo , thanks for you reply.I have tried workspace::RunNet, please review the code, thanks you again: `` void warpInput(caffe2::TensorCPU& input, cv::Mat& src) { //convert to cv32f cv::Mat img; src.convertTo(img, CV_32F);
// split to bgr
std::vector<cv::Mat> bgr;
cv::split(img, bgr);
// warp inputint
int N = 1;
int C = img.channels();
int H = img.rows;
int W = img.cols;
input.Resize(std::vector<int>{N, C, H, W});
float* data = input.mutable_data<float>();
float* p = data;
for(int i = 0; i < C ; i++)
{
cv::Mat channel(H, W, CV_32FC1, p);
bgr[i].copyTo(channel);
p += H * W;
}
}
void testFasterRCNNRun() { std::string model_releative_dir = "e2e_faster_rcnn_R-50-C4_1x/"; std::string init_path = MODEL_DIR + model_releative_dir + "init_net.pb"; std::string predict_path = MODEL_DIR + model_releative_dir + "predict_net.pb"; caffe2::NetDef init_net, predict_net; CAFFE_ENFORCE(ReadProtoFromFile(init_path, &init_net)); CAFFE_ENFORCE(ReadProtoFromFile(predict_path, &predict_net));
//init_net.mutable_device_option()->set_device_type(1);
//init_net.mutable_device_option()->set_cuda_gpu_id(0);
int m = init_net.op_size();
for(int i = 0; i < m; i++)
{
auto* p = init_net.mutable_op(i);
auto type = p->type();
std::cout << i << " -> " << type << std::endl;
}
//CopyCPUToGPU operator
auto* ccg = predict_net.add_op();
ccg->set_name("copy_img");
ccg->set_type("CopyCPUToGPU");
ccg->mutable_device_option()->set_device_type(caffe2::CUDA);
ccg->mutable_device_option()->set_cuda_gpu_id(0);
ccg->add_input();
ccg->set_input(0, "cpu_data");
ccg->add_output();
ccg->set_output(0, "data");
auto* ccginfo = predict_net.add_op();
ccginfo->set_name("copy_im_info");
ccginfo->set_type("CopyCPUToGPU");
ccginfo->mutable_device_option()->set_device_type(caffe2::CUDA);
ccginfo->mutable_device_option()->set_cuda_gpu_id(0);
ccginfo->add_input();
ccginfo->set_input(0, "cpu_im_info");
ccginfo->add_output();
ccginfo->set_output(0, "im_info");
predict_net.add_external_input("cpu_data");
predict_net.add_external_input("cpu_im_info");
int n = predict_net.op_size(); //132
for(int i = 0; i < n; i++)
{
auto* p = predict_net.mutable_op(i);
auto type = p->type();
std::cout << i << " -> " << type << std::endl;
if(type == std::string("GenerateProposals") ||
type == std::string("BBoxTransform") ||
type == std::string("BoxWithNMSLimit"))
{
p->mutable_device_option()->set_device_type(0);
} else{
p->mutable_device_option()->set_device_type(1);
p->mutable_device_option()->set_cuda_gpu_id(0);
}
}
predict_net.mutable_device_option()->set_device_type(1);
predict_net.mutable_device_option()->set_cuda_gpu_id(0);
//
std::string netname = predict_net.name();
caffe2::Workspace w;
w.RunNetOnce(init_net);
w.CreateBlob("cpu_data");
w.CreateBlob("cpu_im_info");
//
cv::Mat img = cv::imread("/home/beichen2012/dataset/2018_05_10_13_16_06_6_0.jpg", 1);
if(!img.data)
{
std::cout <<"error to load image!" << std::endl;
return;
}
//im_info
cv::Mat mat;
cv::resize(img, mat, cv::Size(384, 256));
std::vector<float> vimInfo = {float(mat.rows), float(mat.cols), 0.25f};
//input data cpu
caffe2::TensorCPU inputData;
warpInput(inputData, mat);
caffe2::TensorCPU inputImInfo;
inputImInfo.Resize(std::vector<int>{1,3});
inputImInfo.ShareExternalPointer((float*)vimInfo.data());
//
w.GetBlob("cpu_im_info")->GetMutable<caffe2::TensorCPU>()->ResizeLike(inputImInfo);
w.GetBlob("cpu_im_info")->GetMutable<caffe2::TensorCPU>()->ShareData(inputImInfo);
w.GetBlob("cpu_data")->GetMutable<caffe2::TensorCPU>()->ResizeLike(inputData);
w.GetBlob("cpu_data")->GetMutable<caffe2::TensorCPU>()->ShareData(inputData);
w.CreateNet(predict_net);
//std::vector<std::string>
auto begin = std::chrono::high_resolution_clock::now();
w.RunNet(netname);
auto end = std::chrono::high_resolution_clock::now();
LOG(INFO) << "time cost: " << std::chrono::duration_cast<std::chrono::duration<double>>(end - begin).count();
//
caffe2::TensorCPU* score = w.GetBlob("score_nms")->GetMutable<caffe2::TensorCPU>();
caffe2::TensorCPU* bbox = w.GetBlob("bbox_nms")->GetMutable<caffe2::TensorCPU>();
caffe2::TensorCPU* cls = w.GetBlob("class_nms")->GetMutable<caffe2::TensorCPU>();
LOG(INFO) << "find " << score->size() << " objs!";
int objs = score->size();
for(int i = 0; i < objs; i++)
{
//score
float val_score = score->data<float>()[i];
//cls
float val_cls = score->data<float>()[i];
//bbox
cv::Point val_bbox_pt1, val_bbox_pt2;
cv::Rect val_bbox;
val_bbox_pt1.x = bbox->data<float>()[i * 4 + 0];
val_bbox_pt1.y = bbox->data<float>()[i * 4 + 1];
val_bbox_pt2.x = bbox->data<float>()[i * 4 + 2];
val_bbox_pt2.y = bbox->data<float>()[i * 4 + 3];
val_bbox = cv::Rect{val_bbox_pt1, val_bbox_pt2};
//draw image
auto color = cv::Scalar{0,255,0};
if(val_score < 0.5)
color = cv::Scalar{0,0,255};
cv::rectangle(img, val_bbox, color, 2);
}
cv::namedWindow("1", cv::WINDOW_NORMAL);
cv::imshow("1", img);
cv::waitKey(0);
return;
}
int main(int argc, char** argv) { caffe2::GlobalInit(&argc, &argv); // caffe2::run();
testFasterRCNNRun();
// This is to allow us to use memory leak checks. caffe2::ShutdownProtobufLibrary(); return 0; }
Hi, I try to run my net by CUDA device. In C++, I set cuda argument in netDef after loading .pb file
and got the following error
In python, I use RunAllOnGpu after I construct my net by modelHelper, and got the following error
Is there any idea about the problem? Thanks!