Real-time detection from screen

akinohana commented 6 years ago

after a few modify with YoloV3, now I can process image at a very fast speed

however I need process image from screen, and capture a 1920*1080 window with GDI bitblt cost at least 50ms

and convert it's format into Darknet format image cost another 40ms

I wonder if there a method I can capture screen with power of GPU and send to Darknet directly?

also,I think image is in CPU-memory , not GPU memory, is it possible to input screen capture via GPU, and make Darknet process image directly with GPU memory to maximum processing speed?

FaBeyyy commented 6 years ago

"and convert it's format into Darknet format image cost another 40ms" then you are doing something really wrong my dude... use cv::mat not darknet img

Also if this has something to do with using yolo for gamehacking hit me up i may have something for you...

akinohana commented 6 years ago

what your have for game hacking?

for convert I am using following code:


int PosB(int x, int y)
{
    return ScreenData[4 * ((y*ScreenX) + x)];
}

int PosG(int x, int y)
{
    return ScreenData[4 * ((y*ScreenX) + x) + 1];
}

int PosR(int x, int y)
{
    return ScreenData[4 * ((y*ScreenX) + x) + 2];
}

    for (int j = 0; j < ScreenY; j++)
    {
        for (int i = 0; i < ScreenX; i++)
        {
            rgbB = count + ScreenX * ScreenY * 0;
            rgbG = count + ScreenX * ScreenY * 1;
            rgbR = count + ScreenX * ScreenY * 2;
            out.data[rgbB] = PosB(i, j) / 255.;
            out.data[rgbG] = PosG(i, j) / 255.;
            out.data[rgbR] = PosR(i, j) / 255.;
            count++;
            //count  += step;
        }
    }
    rgbgr_image(out);

I think convert to cv::mat is the same thing, still need go though every pixel

AlexeyAB commented 6 years ago

What network resolution do you use?

This function costs much less than 40 ms if you do it after resizing: https://github.com/AlexeyAB/darknet/blob/be9d971ddb9ea0520da78cfff7788eb5481f095e/src/image.c#L809

But this function costs much: https://github.com/AlexeyAB/darknet/blob/be9d971ddb9ea0520da78cfff7788eb5481f095e/src/image.c#L1475 You can optimize this function using OpenCV on C++ - cv::Mat has the same format as you described in your code:

image resize_cv(char *your_img_src, network_w, network_h) 
{
Mat from raw pointer to 4-channels image
cv::Mat src(cv::Size(1920, 1080), CV_8UC4, your_img_src);

// 4 channels to 3 channels and reorder BGR <-> RGB
cv::Mat rgb_mat;
cv::cvtColor(src, rgb_mat, CV_BGRA2RGB);

// resize image to the Yolo network size
cv::Mat sized;
cv::resize(rgb_mat, sized, cv::Size(w, h), 0, 0, INTER_LINEAR);

// very fast Mat to Yolo-image convertion
IplImage src = sized;
image out = ipl_to_image(&src);
return out;
}

It is seen from your code, that captured image has the same format as cv::Mat for BGRA. So you can easy wrap it to cv::Mat and convert it to RGB.
Then you should do cv::resize() much faster than resize_image(), because OpenCV cv::resize() uses SSE4/AVX and multithreading.
And you should do ipl_to_image() after that image is resized - it's much faster than doing in another order.

Final code

        image img = resize_cv(your_img_src, net.w, net.h);
        float *X = img.data;
        network_predict(net, X);

akinohana commented 6 years ago

I use 320x320 for now,I will try it , thanks for help!

AlexeyAB / darknet

Real-time detection from screen #574