修改demo，不调用摄像头的视频帧，直接对读取的某张图片做推理？

woshing700 commented 2 months ago

例如sscma-example-esp32(1.0)/examples/yolo。
我用的板子是esp32-s3-DevKitC-1（没接摄像头）。我将一张图的数据转为hpp格式，不使用摄像头读取的视频帧，直接读取hpp中的数据，并传给模型推理该如何修改呢？（我翻了下项目没有找到图片预处理的相关代码，）
sscma-example-esp32/components/modules/algorithm/algo_yolo.cpp

我在algo_yolo.cpp中找到了调用模型推理的代码，但不知如何改动；以下是一小部分代码

// Get information about the memory area to use for the model's input.

input = interpreter->input(0);

camera_fb_t *frame = NULL;

uint16_t h = input->dims->data[1]; uint16_t w = input->dims->data[2]; uint16_t c = input->dims->data[3];

if (c == 1) rgb565_to_gray(input->data.uint8, frame->buf, frame->height, frame->width, h, w, ROTATION_UP);

else if (c == 3) rgb565_to_rgb888(input->data.uint8, frame->buf, frame->height, frame->width, h, w, ROTATION_UP); 不知这个rgb565_to_rgb888()函数是否包含reszie的过程。

如果我想用固定的图片来代替视频帧，是否将frame换为我的图片数据即可？

woshing700 commented 2 months ago

我按照以上思路做了，但是模型推理得到的结果不尽人意，不知道是不是我的数据读入方式有误

`
{"class": "0", "x": 100, "y": 65, "w": 6, "h": 5, "confidence": 19}, {"class": "0", "x": 31, "y": 36, "w": 10, "h": 8, "confidence": 30}, {"class": "0", "x": 17, "y": 66, "w": 8, "h": 4, "confidence": 12},

`

以下是我转换hpp的代码 `import cv2 import numpy import argparse

if name == 'main': parser = argparse.ArgumentParser(description='Model generator tool') parser.add_argument('-i', '--input', help='path to image') parser.add_argument('-o', '--output', help='path to image.hpp') args = parser.parse_args()

if args.input is None or args.output is None:
    parser.print_help()
    quit()

image = cv2.imread(args.input)
h,w,c=image.shape

with open(args.output, 'w') as file:
    file.write('#pragma once\n'
               '#include <stdint.h>\n\n'
               f'#define IMAGE_HEIGHT {h}\n'
               f'#define IMAGE_WIDTH {w}\n'
               f'#define IMAGE_CHANNEL {c}\n\n'
               'const static uint8_t image_data[] = {\n')

    image = numpy.reshape(image, (-1,))
    for i, element in enumerate(image[:-1], 1):
        if i == 1:
            file.write('    ')

        file.write(f'{element}, ')

        if i % 32 == 0:
            file.write('\n    ')
    file.write(f'{image[-1]}')
    file.write('};\n')`

LynnL4 commented 2 months ago

Hi, rgb565_to_rgb888 是包含了resize的部分的，你开源看一下的的图像数据是否被归一化了，我们默认的模型都是RGB888（-128， 127）的输入格式，所以，你需要在输入到模型前，做 -128 的操作

woshing700 commented 2 months ago

hi，我调整了部分代码：

opencv读取时：image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)//BGR转为RGB
因为我读取的格式已经是RGB888，所以改写了 rgb565_to_rgb888函数，只进行尺寸缩放，不变换rgb格式。
下面是结果可视化：基本只识别了person上半身，并且置信度最高只有57（不知道是我的图与训练模型的数据集差异大？还是模型精度本身就低？）
希望可以看到模型应该有的效果，衷心希望您提供这个模型的识别效果图！

LynnL4 commented 2 months ago

Hi，你使用Seeed Xiao ESP32S3进行测试？如果是的话，可以通过以下网页快速部署模型，查看效果，另外，你的模型是自己训练的吗？通过什么方法 https://seeed-studio.github.io/SenseCraft-Web-Toolkit/#/setup/process https://sensecraftma.seeed.cc/deploy/overview

woshing700 commented 2 months ago

hi！我的板子是esp32-s3-DevKitC-1，并不是Xiao ESP32S3。在网页没法部署。关于模型则是使用官方提供的demo---“yolo” https://github.com/Seeed-Studio/sscma-example-esp32/tree/1.0.0/examples/yolo

Seeed-Studio / sscma-example-esp32

修改demo，不调用摄像头的视频帧，直接对读取的某张图片做推理？ #22

sscma-example-esp32/components/modules/algorithm/algo_yolo.cpp

我在algo_yolo.cpp中找到了调用模型推理的代码，但不知如何改动；以下是一小部分代码

我按照以上思路做了，但是模型推理得到的结果不尽人意，不知道是不是我的数据读入方式有误