JimmyLauren / yolo5_caffe_hisi3559

yolov5: pytorch->onnx->caffe->hisi3559
18 stars 11 forks source link

运行caffe_yolov5出现Cannot use GPU in CPU-only Caffe: check mode. #2

Closed dengxiongshi closed 1 year ago

dengxiongshi commented 1 year ago

Question: F1124 10:49:59.498611 10201 conv_layer.cpp:76] Cannot use GPU in CPU-only Caffe: check mode. Check failure stack trace: @ 0x7efeb7ed9b03 google::LogMessage::Fail() @ 0x7efeb7ee19d1 google::LogMessage::SendToLog() @ 0x7efeb7ed97c2 google::LogMessage::Flush() @ 0x7efeb7edb78f google::LogMessageFatal::~LogMessageFatal() @ 0x7efeb8245a4d caffe::ConvolutionLayer<>::Forward_gpu() @ 0x7efeb8204e78 caffe::Layer<>::Forward() @ 0x7efeb82eb381 caffe::Net<>::ForwardFromTo() @ 0x7efeb82eaffd caffe::Net<>::Forward() @ 0x55aa7b7dadea main @ 0x7efeb4b58d90 (unknown) @ 0x7efeb4b58e40 __libc_start_main @ 0x55aa7b7d8f65 _start @ (nil) (unknown) 已放弃

环境:Ubuntu22,caffe编译使用的是CPU,已在Makefile.config把CPU_ONLY := 1,Caffe::set_mode(Caffe::CPU); 运行命令:./caffe_yolov5s.bin-d best-sim.prototxt best-sim.caffemodel val2017/

请问下我已经把GPU改成CPU了,为什么还会跳转到Forward_gpu(),并出现上面的错误,还是必须要使用GPU才能运行吗,小白一个。

JimmyLauren commented 1 year ago

只编译CPU版本的话还需要把cudnn依赖也去掉

dengxiongshi commented 1 year ago

你好,我已经在Makefile.config中设置成# USE_CUDNN := 1,并且编译成功可以运行了。但是运行caffe_yolov5s时检测不到目标

/home/dxs/yolov5s_caffe_sort_cpp-master/cmake-build-debug/demo
preprocess_img finished!
[ 0 ] 990 ms.
output0 shape: 1 3 80 80 85
output1 shape: 1 3 40 40 85
output2 shape: 1 3 20 20 85
0
000000000049.jpg

我的caffe模型是coco训练的,并且我在caffe_yolov5s.cpp的基础上进行了代码更改,是caffe读取权重文件哪方面出了问题吗

#include "caffe/caffe.hpp"
#include <string>
#include <vector>
#include <sys/io.h>
#include <unistd.h>
#include <stdio.h>

#include <iostream>
#include <sys/types.h>
#include <dirent.h>
#include <string.h>
#include <sys/stat.h>

#define USE_OPENCV

#ifdef USE_OPENCV

#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>

#endif // USE_OPENCV

// 用于计时
#include <boost/date_time/posix_time/posix_time.hpp>

#define INPUT_W 640
#define INPUT_H 640
// #define IsPadding 0
#define NUM_CLASS 80
#define NMS_THRESH 0.45
#define CONF_THRESH 0.11
//std::string prototxt_path = "../model/yolov5s-4.0-focus.prototxt";
//std::string caffemodel_path = "../model/yolov5s-4.0-focus.caffemodel";
//std::string pic_path = "/home/willer/calibration_data/2ad80d25-b022-3b9d-a46f-853f112c2dfe.jpg";

using namespace cv;
using namespace std;
using namespace caffe;
using std::string;

using caffe::Blob;
using caffe::Caffe;
using caffe::Layer;
using caffe::Net;
using caffe::shared_ptr;
using caffe::string;
using caffe::vector;
using std::cout;
using std::endl;
using std::ostringstream;

/* class name */
const char *className[NUM_CLASS] = {"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck",
                                    "boat", "traffic light",
                                    "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog",
                                    "horse", "sheep", "cow",
                                    "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie",
                                    "suitcase", "frisbee",
                                    "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove",
                                    "skateboard", "surfboard",
                                    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl",
                                    "banana", "apple",
                                    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
                                    "chair", "couch",
                                    "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote",
                                    "keyboard", "cell phone",
                                    "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase",
                                    "scissors", "teddy bear",
                                    "hair drier", "toothbrush"};

struct Bbox {
    float xmin;
    float ymin;
    float xmax;
    float ymax;
    float score;
    int cid;
};

struct Anchor {
    float width;
    float height;
};

bool get_all_files(const std::string &dir_in, std::vector<std::string> &files) {
    if (dir_in.empty()) {
        return false;
    }
    struct stat s;
    stat(dir_in.c_str(), &s);
    if (!S_ISDIR(s.st_mode)) {
        return false;
    }
    DIR *open_dir = opendir(dir_in.c_str());
    if (NULL == open_dir) {
        std::exit(EXIT_FAILURE);
    }
    dirent *p = nullptr;
    while ((p = readdir(open_dir)) != nullptr) {
        struct stat st;
        if (p->d_name[0] != '.') {
            //因为是使用devC++ 获取windows下的文件,所以使用了 "\" ,linux下要换成"/"
            std::string name = dir_in + std::string("/") + std::string(p->d_name);
            stat(name.c_str(), &st);
            if (S_ISDIR(st.st_mode)) {
                get_all_files(name, files);
            } else if (S_ISREG(st.st_mode)) {
                files.push_back(name);
            }
        }
    }
    closedir(open_dir);
    return true;
}

std::vector<Anchor> initAnchors() {
    std::vector<Anchor> anchors;
    Anchor anchor;
    // 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90,  156,198,  373,326
    anchor.width = 10;
    anchor.height = 13;
    anchors.emplace_back(anchor);
    anchor.width = 16;
    anchor.height = 30;
    anchors.emplace_back(anchor);
    anchor.width = 33;
    anchor.height = 23;
    anchors.emplace_back(anchor);
    anchor.width = 30;
    anchor.height = 61;
    anchors.emplace_back(anchor);
    anchor.width = 62;
    anchor.height = 45;
    anchors.emplace_back(anchor);
    anchor.width = 59;
    anchor.height = 119;
    anchors.emplace_back(anchor);
    anchor.width = 116;
    anchor.height = 90;
    anchors.emplace_back(anchor);
    anchor.width = 156;
    anchor.height = 198;
    anchors.emplace_back(anchor);
    anchor.width = 373;
    anchor.height = 326;
    anchors.emplace_back(anchor);
    return anchors;
}

template<typename T>
T clip(const T &n, const T &lower, const T &upper) {
    return std::max(lower, std::min(n, upper));
}

template<class ForwardIterator>
inline size_t argmax(ForwardIterator first, ForwardIterator last) {
    return std::distance(first, std::max_element(first, last));
}

void yoloTransform(const int &ih, const int &iw, const int &oh, const int &ow, std::vector<Bbox> &bboxes
        /*bool is_padding*/) {
    for (auto &bbox : bboxes) {
        bbox.xmin = bbox.xmin * iw / ow;
        bbox.ymin = bbox.ymin * ih / oh;
        bbox.xmax = bbox.xmax * iw / ow;
        bbox.ymax = bbox.ymax * ih / oh;
    }
}

cv::Mat renderBoundingBox(cv::Mat image, const std::vector<Bbox> &bboxes) {
    cv::Mat src = image.clone();
    for (auto it: bboxes) {
        float score = it.score;
        int id = it.cid;
        cv::String cls = cv::String(className[id]);
        char tmp[8];
        // int ret = snprintf(tmp, sizeof(tmp) / sizeof(tmp[0]), "%.2f", objects[i].confidence);
        snprintf(tmp, sizeof(tmp) / sizeof(tmp[0]), "%.2f", score);
        cv::String conf = tmp;
        cv::String txt = cls + cv::String(" ") + tmp;
        cv::rectangle(src, cv::Rect(it.xmin, it.ymin, it.xmax - it.xmin, it.ymax - it.ymin), cv::Scalar(0, 255, 0));
        // cv::rectangle(image, cv::Point(it.xmin, it.ymin), cv::Point(it.xmax, it.ymax), cv::Scalar(0, 255,0), 3);
        // cv::putText(image, std::to_string(id)+"_"+std::to_string(score), cv::Point(it.xmin, it.ymin), cv::FONT_HERSHEY_COMPLEX, 2.0, cv::Scalar(0,0,255));
        cv::putText(src, txt, cv::Point(it.xmin, it.ymin), cv::FONT_HERSHEY_SIMPLEX, 1.0, cv::Scalar(0, 0, 255));
    }
    return src;
}

void nms_cpu(std::vector<Bbox> &bboxes, float threshold) {
    if (bboxes.empty()) {
        return;
    }
    // 1.之前需要按照score排序
    std::sort(bboxes.begin(), bboxes.end(), [&](Bbox b1, Bbox b2) { return b1.score > b2.score; });
    // 2.先求出所有bbox自己的大小
    std::vector<float> area(bboxes.size());
    for (int i = 0; i < bboxes.size(); ++i) {
        area[i] = (bboxes[i].xmax - bboxes[i].xmin + 1) * (bboxes[i].ymax - bboxes[i].ymin + 1);
    }
    // 3.循环
    for (int i = 0; i < bboxes.size(); ++i) {
        for (int j = i + 1; j < bboxes.size();) {
            float left = std::max(bboxes[i].xmin, bboxes[j].xmin);
            float right = std::min(bboxes[i].xmax, bboxes[j].xmax);
            float top = std::max(bboxes[i].ymin, bboxes[j].ymin);
            float bottom = std::min(bboxes[i].ymax, bboxes[j].ymax);
            float width = std::max(right - left + 1, 0.f);
            float height = std::max(bottom - top + 1, 0.f);
            float u_area = height * width;
            float iou = (u_area) / (area[i] + area[j] - u_area);
            if (iou >= threshold) {
                bboxes.erase(bboxes.begin() + j);
                area.erase(area.begin() + j);
            } else {
                ++j;
            }
        }
    }
}

template<typename T>
T sigmoid(const T &n) {
    return 1 / (1 + exp(-n));
}

void postProcessParall(const int height, const int width, int scale_idx, float postThres, float *origin_output,
                       vector<int> Strides, vector<Anchor> Anchors, vector<Bbox> *bboxes) {
    Bbox bbox;
    float cx, cy, w_b, h_b, score;
    int cid;
    const float *ptr = (float *) origin_output;
    for (unsigned long a = 0; a < 3; ++a) {
        for (unsigned long h = 0; h < height; ++h) {
            for (unsigned long w = 0; w < width; ++w) {
                const float *cls_ptr = ptr + 5;
                cid = argmax(cls_ptr, cls_ptr + NUM_CLASS);
                score = sigmoid(ptr[4]) * sigmoid(cls_ptr[cid]);

                if (score >= postThres) {
                    cx = (sigmoid(ptr[0]) * 2.f - 0.5f + static_cast<float>(w)) *
                         static_cast<float>(Strides[scale_idx]);
                    cy = (sigmoid(ptr[1]) * 2.f - 0.5f + static_cast<float>(h)) *
                         static_cast<float>(Strides[scale_idx]);
                    w_b = powf(sigmoid(ptr[2]) * 2.f, 2) * Anchors[scale_idx * 3 + a].width;
                    h_b = powf(sigmoid(ptr[3]) * 2.f, 2) * Anchors[scale_idx * 3 + a].height;
                    bbox.xmin = clip(cx - w_b / 2, 0.F, static_cast<float>(INPUT_W - 1));
                    bbox.ymin = clip(cy - h_b / 2, 0.f, static_cast<float>(INPUT_H - 1));
                    bbox.xmax = clip(cx + w_b / 2, 0.f, static_cast<float>(INPUT_W - 1));
                    bbox.ymax = clip(cy + h_b / 2, 0.f, static_cast<float>(INPUT_H - 1));
                    bbox.score = score;
                    bbox.cid = cid + 1;
                    /*
                    std::cout<< "bbox.cid : " << bbox.cid << std::endl;
                                        std::cout<< "bbox.score : " << bbox.score << std::endl;
                                        std::cout<< "bbox.xmin : " << bbox.xmin << std::endl;
                                        std::cout<< "bbox.ymin : " << bbox.ymin << std::endl;
                                        std::cout<< "bbox.xmax : " << bbox.xmax << std::endl;
                                        std::cout<< "bbox.ymax : " << bbox.ymax << std::endl;
                                        */
                    bboxes->push_back(bbox);
                }
                ptr += 5 + NUM_CLASS;
            }
        }
    }
}

vector<Bbox> postProcess(vector<float *> origin_output, float postThres, float nmsThres) {

    vector<Anchor> Anchors = initAnchors();
    vector<Bbox> bboxes;
    vector<int> Strides = vector<int>{8, 16, 32};
    for (int scale_idx = 0; scale_idx < 3; ++scale_idx) {
        const int stride = Strides[scale_idx];
        const int width = (INPUT_W + stride - 1) / stride;
        const int height = (INPUT_H + stride - 1) / stride;
        //std::cout << "width : " << width << " " << "height : " << height << std::endl;
        float *cur_output_tensor = origin_output[scale_idx];
        postProcessParall(height, width, scale_idx, postThres, cur_output_tensor, Strides, Anchors, &bboxes);
    }
    nms_cpu(bboxes, nmsThres);
    return bboxes;
}

cv::Mat preprocess_img(cv::Mat &img /*, bool is_padding*/) {
    cv::Mat out;
    cv::resize(img, out, cv::Size(INPUT_H, INPUT_W), cv::INTER_LINEAR);
    return out;
}

int main(int argc, char *argv[]) {
//  if (argc != 4)
//  {
//      std::cout<<"usage: exe prototxt caffemodel jpg_list_file"<<std::endl;
//      return -1;
//  }
    std::string prototxt_path = "../model/best-sim.prototxt";
    std::string caffemodel_path = "../model/best-sim.caffemodel";
    std::string image_path = "../data/000000000049.jpg";

    /* read/save image path */
//    cv::String imagePath = "../data/000000000165.jpg";
    cv::String savePath = "../data/test.jpg";

    ::google::InitGoogleLogging("caffe"); //初始化日志文件,不调用会给出警告,但不会报错
    // 改成CPU,把Caffe::GPU改为Caffe::CPU
    Caffe::set_mode(Caffe::CPU);
    Caffe::set_solver_rank(1); //不进行日志输出
    Net<float> caffe_net(prototxt_path, caffe::TEST, 0, nullptr);
    caffe_net.CopyTrainedLayersFrom(caffemodel_path);

    // 读入图片
    cv::Mat img = cv::imread(image_path);
    //cv::Mat img = cv::imread("../model/21.jpg");
    CHECK(!img.empty()) << "Unable to decode image ";
    cv::Mat showImage = img.clone();

    // 图片预处理,并加载图片进入blob
    Blob<float> *input_layer = caffe_net.input_blobs()[0];
    float *input_data = input_layer->mutable_cpu_data();

    static float data[3 * INPUT_H * INPUT_W];
    cv::Mat pre_img = preprocess_img(img /*,IsPadding*/);
    std::cout << "preprocess_img finished!\n";
    int i = 0;
    for (int row = 0; row < INPUT_H; ++row) {
        uchar *uc_pixel = pre_img.data + row * pre_img.step;
        for (int col = 0; col < INPUT_W; ++col) {  // opencv读取原始格式为bgr格式,按照0,1,2顺序为bgr,2,1,0顺序为rgb格式
            data[i] = (float) uc_pixel[0] / 255.0;
            data[i + INPUT_H * INPUT_W] = (float) uc_pixel[1] / 255.0;
            data[i + 2 * INPUT_H * INPUT_W] = (float) uc_pixel[2] / 255.0;
            uc_pixel += 3;
            ++i;
        }
    }

#if 0
    printf("begin save txt.\n");
    FILE* fWrite = fopen("input.txt","w");
    for(int i = 0; i < 3 * INPUT_H * INPUT_W; i++)
    {
        fprintf(fWrite,"%.0f \n",data[i]*255.0);
    }
    fclose(fWrite);
    printf("end save txt.\n");
#endif

    memcpy((float *) (input_data),
           data, 3 * INPUT_H * INPUT_W * sizeof(float));

    //boost::posix_time::ptime start_time_ = boost::posix_time::microsec_clock::local_time(); //开始计时
    float total_time = 0;
    //前向运算
    int nums = 1;
    for (int i = 0; i < nums; ++i) {
        boost::posix_time::ptime start_time_1 = boost::posix_time::microsec_clock::local_time();
        caffe_net.Forward();
        boost::posix_time::ptime end_time_1 = boost::posix_time::microsec_clock::local_time();
        total_time += (end_time_1 - start_time_1).total_milliseconds();
        std::cout << "[ " << i << " ] " << (end_time_1 - start_time_1).total_milliseconds() << " ms." << std::endl;
    }

    //boost::posix_time::ptime end_time_ = boost::posix_time::microsec_clock::local_time(); //结束计时

    Blob<float> *output_layer0 = caffe_net.output_blobs()[2];
    const float *output0 = output_layer0->cpu_data();
    cout << "output0 shape: " << output_layer0->shape(0) << " " << output_layer0->shape(1) << " "
         << output_layer0->shape(2) << " " << output_layer0->shape(3) << " " << output_layer0->shape(4) << endl;

    Blob<float> *output_layer1 = caffe_net.output_blobs()[0];
    const float *output1 = output_layer1->cpu_data();
    cout << "output1 shape: " << output_layer1->shape(0) << " " << output_layer1->shape(1) << " "
         << output_layer1->shape(2) << " " << output_layer1->shape(3) << " " << output_layer1->shape(4) << endl;

    Blob<float> *output_layer2 = caffe_net.output_blobs()[1];
    const float *output2 = output_layer2->cpu_data();
    cout << "output2 shape: " << output_layer2->shape(0) << " " << output_layer2->shape(1) << " "
         << output_layer2->shape(2) << " " << output_layer2->shape(3) << " " << output_layer2->shape(4) << endl;

    vector<float *> cur_output_tensors;
    cur_output_tensors.push_back(const_cast<float *>(output0));
    cur_output_tensors.push_back(const_cast<float *>(output1));
    cur_output_tensors.push_back(const_cast<float *>(output2));

    vector<Bbox> bboxes = postProcess(cur_output_tensors, CONF_THRESH, NMS_THRESH);
    printf("%d\n", bboxes.size());

    yoloTransform(showImage.rows, showImage.cols, INPUT_W, INPUT_H, bboxes /*, IsPadding*/);
    cv::Mat resultImage = renderBoundingBox(showImage, bboxes);

    string sFileName = image_path.substr(image_path.find_last_of("/") + 1);
    printf("%s\n", sFileName.c_str());
    sFileName = "img_result/" + sFileName;
    cv::imwrite(sFileName.c_str(), resultImage);
    cv::namedWindow("img", cv::WINDOW_NORMAL);
    cv::imshow("img", resultImage);
    cv::waitKey();

    std::cout << "average time : " << total_time / nums * 1.0 << " ms" << std::endl;

}
dengxiongshi commented 1 year ago

另外,我的caffe是在windows上编译第三方修改的caffe库,caffe模型转换也是在windows上进行的,我就是想验证转换的caffe模型效果怎么样。我也看了网上别人转换的caffe模型,在prototxt网络层中,我的开头如下:

layer {
  name: "images"
  type: "Input"
  top: "images"
  input_param {
    shape {
      dim: 1
      dim: 3
      dim: 640
      dim: 640
    }
  }
}
layer {
  name: "Conv_0"
  type: "Convolution"
  bottom: "images"
  top: "126"
  convolution_param {
    num_output: 32
    bias_term: true
    group: 1
    pad_h: 2
    pad_w: 2
    kernel_h: 6
    kernel_w: 6
    stride_h: 2
    stride_w: 2
    dilation: 1
  }
}
layer {
  name: "Relu_1"
  type: "ReLU"
  bottom: "126"
  top: "127"
}

别人的是这样的:

name: "yolov5s"
input: "blob1"
input_dim: 1
input_dim: 3
input_dim: 640
input_dim: 640
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "blob1"
  top: "conv_blob1"
  convolution_param {
    num_output: 32
    bias_term: false
    pad: 1
    kernel_size: 3
    group: 1
    stride: 2
    weight_filler {
      type: "xavier"
    }
    dilation: 1
  }
}
layer {
  name: "batch_norm1"
  type: "BatchNorm"
  bottom: "conv_blob1"
  top: "batch_norm_blob1"
  batch_norm_param {
    use_global_stats: true
    eps: 0.001
  }
}
layer {
  name: "bn_scale1"
  type: "Scale"
  bottom: "batch_norm_blob1"
  top: "batch_norm_blob1"
  scale_param {
    bias_term: true
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "batch_norm_blob1"
  top: "relu_blob1"
}

感觉区别好大

dengxiongshi commented 1 year ago

你好,我调试后可以运行了,不过会出现漏检的情况,还没搞明白哪出问题