facebookarchive / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.
https://caffe2.ai
Apache License 2.0
8.42k stars 1.94k forks source link

Cannot create operator of type 'Conv' on the device 'CUDA' #701

Open honamida opened 7 years ago

honamida commented 7 years ago

Hi, I try to run my net by CUDA device. In C++, I set cuda argument in netDef after loading .pb file

 predict_net.mutable_device_option()->set_device_type(caffe2::CUDA);
 predict_net.mutable_device_option()->set_cuda_gpu_id(0);

 caffe2::Predictor p(init_net, predict_net, &workspace);

and got the following error

what(): [enforce fail at operator.cc:110] op. Cannot create operator of type 'Conv' on the device 'CUDA'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing.

In python, I use RunAllOnGpu after I construct my net by modelHelper, and got the following error

RuntimeError: [enforce fail at blob.h:76] IsType(). wrong type for the Blob instance. Blob contains caffe2::Tensor while caller expects caffe2::Tensor . Offending Blob name: data.

Is there any idea about the problem? Thanks!

peterneher commented 7 years ago

Did you train on CPU or GPU?

honamida commented 7 years ago

I tried to run training and testing code on GPU, but failed

peterneher commented 7 years ago

Ah, sorry. I missed that you are using the C++ caffe2::Predictor caffe::Predictor does not work with a CUDA tensor as input since it uses a CPU tensor internally. I tried a workaround by implementing a CUDA version of the predictor but ran into other issues (see #694). CUDA prediction in C++ seems tu be surprisingly difficult :)

Yangqing commented 7 years ago

cc/ @salexspb - maybe this is something we can abstract out in the predictor interface?

honamida commented 7 years ago

Hi, This make me wonder if the tutorial which ran net by predictor is based on CUDA or not. BTW, how to solve the problems on Python API? Do I need to add some operators like copyCPUtoGPU or something? The message tells that it got the CPUContext.

Another question is that does this means we can solve this problem by changing to workspace interface, giving the data by FeedBlob? If so, is there something I have to keep attention?

Thanks!

peterneher commented 7 years ago

In python training and prediction on CPU it worked for me (https://github.com/peterneher/peters-stuff/blob/master/Caffe2Scripts/classification_no_db_example.py). Unfortunately training on CPU and then testing on CPU did not.

salexspb commented 7 years ago

Hey guys. As of right now I actually recommend using raw nets API from C++. This predictor does not do much on top of it in any way. We will think about a better story here. Still nets API should be also good I think. And then in the net you need to specify in operators proto which device it runs on.

Let me know if it works for you! We will keep you tuned on newer APIs.

peterneher commented 7 years ago

Hi @salexspb Do you have sample code for GPU C++ prediction using the nets API?

salexspb commented 7 years ago

You basically could use workspace->RunNet. Being GPU / Non GPU is set by a device_option of each operator. When you construct your net in python , it is easy to control by using DeviceScope. You could change it manually afterwards in either python or c++ (this is just a protobuf). Another way is to have a function which creates your net. Then you could call it with one device scope, train a model. Then call it again in another device scope and get a model which has same weights but executes on a different device. You will have to take care of weights being on corresponding device as well.

beichen2012 commented 6 years ago

now is: 2018/7/19, is there any solutions?

beichen2012 commented 6 years ago

@honamida ,Hi, I am sorry to bother you. but is there any solutions to this question? I have tried workspace::RunNet, and add a "CopyCPUToGPU" operator to copy the input cv::Mat to TensorCUDA, but the error occurs the same: cannot create operator of type conv on the device cuda..

beichen2012 commented 6 years ago

@Yangqing ,打扰了。这个issue开了好久了。不知道现在有没有解决方案?如果C++不能愉快的(合理的,方便的)进行前向预测,那caffe2所谓的比其他框架更方便于工程化就无从谈起,毕竟稍微大一点的模型都是需要使用GPU的。 所以,恳请解决这个问题。 谢谢。

ezineo commented 6 years ago

@beichen2012 You can use lower interface for gpu prediction. That caffe2::Predictor interface is just an example for cpu prediction.

beichen2012 commented 6 years ago

@ezineo , thanks for you reply.I have tried workspace::RunNet, please review the code, thanks you again: `` void warpInput(caffe2::TensorCPU& input, cv::Mat& src) { //convert to cv32f cv::Mat img; src.convertTo(img, CV_32F);

// split to bgr
std::vector<cv::Mat> bgr;
cv::split(img, bgr);

// warp inputint
int N = 1;
int C = img.channels();
int H = img.rows;
int W = img.cols;
input.Resize(std::vector<int>{N, C, H, W});
float* data = input.mutable_data<float>();
float* p = data;
for(int i = 0; i < C ; i++)
{
    cv::Mat channel(H, W, CV_32FC1, p);
    bgr[i].copyTo(channel);
    p += H * W;
}

}

void testFasterRCNNRun() { std::string model_releative_dir = "e2e_faster_rcnn_R-50-C4_1x/"; std::string init_path = MODEL_DIR + model_releative_dir + "init_net.pb"; std::string predict_path = MODEL_DIR + model_releative_dir + "predict_net.pb"; caffe2::NetDef init_net, predict_net; CAFFE_ENFORCE(ReadProtoFromFile(init_path, &init_net)); CAFFE_ENFORCE(ReadProtoFromFile(predict_path, &predict_net));

//init_net.mutable_device_option()->set_device_type(1);
//init_net.mutable_device_option()->set_cuda_gpu_id(0);

int m = init_net.op_size();
for(int i = 0; i < m; i++)
{
    auto* p = init_net.mutable_op(i);
    auto type = p->type();
    std::cout << i << " -> " << type << std::endl;
}

//CopyCPUToGPU operator
auto* ccg = predict_net.add_op();
ccg->set_name("copy_img");
ccg->set_type("CopyCPUToGPU");
ccg->mutable_device_option()->set_device_type(caffe2::CUDA);
ccg->mutable_device_option()->set_cuda_gpu_id(0);
ccg->add_input();
ccg->set_input(0, "cpu_data");
ccg->add_output();
ccg->set_output(0, "data");

auto* ccginfo = predict_net.add_op();
ccginfo->set_name("copy_im_info");
ccginfo->set_type("CopyCPUToGPU");
ccginfo->mutable_device_option()->set_device_type(caffe2::CUDA);
ccginfo->mutable_device_option()->set_cuda_gpu_id(0);
ccginfo->add_input();
ccginfo->set_input(0, "cpu_im_info");
ccginfo->add_output();
ccginfo->set_output(0, "im_info");

predict_net.add_external_input("cpu_data");
predict_net.add_external_input("cpu_im_info");

int n = predict_net.op_size(); //132

for(int i = 0; i < n; i++)
{
    auto* p = predict_net.mutable_op(i);
    auto type = p->type();
    std::cout << i << " -> " << type << std::endl;
    if(type == std::string("GenerateProposals") ||
            type == std::string("BBoxTransform") ||
            type == std::string("BoxWithNMSLimit"))
    {
        p->mutable_device_option()->set_device_type(0);
    } else{
        p->mutable_device_option()->set_device_type(1);
        p->mutable_device_option()->set_cuda_gpu_id(0);
    }
}

predict_net.mutable_device_option()->set_device_type(1);
predict_net.mutable_device_option()->set_cuda_gpu_id(0);

//
std::string netname = predict_net.name();
caffe2::Workspace w;

w.RunNetOnce(init_net);

w.CreateBlob("cpu_data");
w.CreateBlob("cpu_im_info");

//
cv::Mat img = cv::imread("/home/beichen2012/dataset/2018_05_10_13_16_06_6_0.jpg", 1);
if(!img.data)
{
    std::cout <<"error to load image!" << std::endl;
    return;
}
//im_info
cv::Mat mat;
cv::resize(img, mat, cv::Size(384, 256));
std::vector<float> vimInfo = {float(mat.rows), float(mat.cols), 0.25f};

//input data cpu
caffe2::TensorCPU inputData;
warpInput(inputData, mat);

caffe2::TensorCPU inputImInfo;
inputImInfo.Resize(std::vector<int>{1,3});
inputImInfo.ShareExternalPointer((float*)vimInfo.data());

//
w.GetBlob("cpu_im_info")->GetMutable<caffe2::TensorCPU>()->ResizeLike(inputImInfo);
w.GetBlob("cpu_im_info")->GetMutable<caffe2::TensorCPU>()->ShareData(inputImInfo);

w.GetBlob("cpu_data")->GetMutable<caffe2::TensorCPU>()->ResizeLike(inputData);
w.GetBlob("cpu_data")->GetMutable<caffe2::TensorCPU>()->ShareData(inputData);

w.CreateNet(predict_net);

//std::vector<std::string>

auto begin = std::chrono::high_resolution_clock::now();
w.RunNet(netname);
auto end = std::chrono::high_resolution_clock::now();
LOG(INFO) << "time cost: " << std::chrono::duration_cast<std::chrono::duration<double>>(end - begin).count();
//
caffe2::TensorCPU* score = w.GetBlob("score_nms")->GetMutable<caffe2::TensorCPU>();
caffe2::TensorCPU* bbox = w.GetBlob("bbox_nms")->GetMutable<caffe2::TensorCPU>();
caffe2::TensorCPU* cls = w.GetBlob("class_nms")->GetMutable<caffe2::TensorCPU>();

LOG(INFO) << "find " << score->size() << " objs!";
int objs = score->size();
for(int i = 0; i < objs; i++)
{
    //score
    float val_score = score->data<float>()[i];
    //cls
    float val_cls = score->data<float>()[i];

    //bbox
    cv::Point val_bbox_pt1, val_bbox_pt2;
    cv::Rect val_bbox;
    val_bbox_pt1.x = bbox->data<float>()[i * 4 + 0];
    val_bbox_pt1.y = bbox->data<float>()[i * 4 + 1];
    val_bbox_pt2.x = bbox->data<float>()[i * 4 + 2];
    val_bbox_pt2.y = bbox->data<float>()[i * 4 + 3];
    val_bbox = cv::Rect{val_bbox_pt1, val_bbox_pt2};

    //draw image
    auto color = cv::Scalar{0,255,0};
    if(val_score < 0.5)
        color = cv::Scalar{0,0,255};
    cv::rectangle(img, val_bbox, color, 2);
}

cv::namedWindow("1", cv::WINDOW_NORMAL);
cv::imshow("1", img);
cv::waitKey(0);

return;

}

int main(int argc, char** argv) { caffe2::GlobalInit(&argc, &argv); // caffe2::run();

testFasterRCNNRun();

// This is to allow us to use memory leak checks. caffe2::ShutdownProtobufLibrary(); return 0; }