CodeWithKyrian / transformers-php

Transformers PHP is a toolkit for PHP developers to add machine learning magic to their projects easily.
https://codewithkyrian.github.io/transformers-php/
Apache License 2.0
536 stars 28 forks source link

How to Use the YOLO Model #54

Closed k99k5 closed 2 months ago

k99k5 commented 2 months ago

Your question

Hey there! Is there a way to invoke the Yolo model, similar to how it's done at https://huggingface.co/spaces/Xenova/yolov9-web?

Context (optional)

No response

Reference (optional)

No response

k99k5 commented 2 months ago

@CodeWithKyrian I've written an example. Would you be interested in updating it to the document?

use Codewithkyrian\Transformers\FeatureExtractors\ImageFeatureExtractor;
use Codewithkyrian\Transformers\Models\Auto\AutoModel;
use Codewithkyrian\Transformers\Processors\AutoProcessor;
use Codewithkyrian\Transformers\Transformers;
use Codewithkyrian\Transformers\Utils\Image;
use Codewithkyrian\Transformers\Utils\ImageDriver;

Transformers::setup()
    ->setCacheDir(__DIR__ . '/.transformers-cache')
    ->setImageDriver(ImageDriver::IMAGICK)
    ->apply();

$processor = AutoProcessor::fromPretrained('Xenova/yolov9-c_all');
$processor->featureExtractor = new ImageFeatureExtractor([
    'size' => [
        'shortest_edge' => 256
    ]
]);

$model = AutoModel::fromPretrained('Xenova/yolov9-c_all');
foreach (glob('*.png') as $file) {
    $inputs = $processor(Image::read($file));
    list('outputs' => $outputs) = $model($inputs);

    $sizes = array_reverse($inputs['reshaped_input_sizes'][0]);
    $boxes = array_map(function ($args) use ($inputs, $model, $sizes): ?array {
        list($xmin, $ymin, $xmax, $ymax, $score, $id) = $args;
        list($w, $h) = $sizes;
        if ($score < 0.25) return null;
        return [
            'left' => $xmin / $w,
            'top' => $ymin / $h,
            'width' => ($xmax - $xmin) / $w,
            'height' => ($ymax - $ymin) / $h,
            'score' => $score,
            'label' => $model->config['id2label'][$id] ?? 'unknown',
        ];
    }, $outputs->toArray());
    $boxes = array_filter($boxes, fn ($box) => !is_null($box));
    $boxes = array_values($boxes);
    var_dump($boxes);
}

Output:

Unknown model class for model type yolov9. Using base class PreTrainedModel.
array(1) {
  [0]=>
  array(6) {
    ["left"]=>
    float(0.02566184997558594)
    ["top"]=>
    float(0.08840230305989584)
    ["width"]=>
    float(0.7026679992675782)
    ["height"]=>
    float(1.2425599597749255)
    ["score"]=>
    float(0.5910435914993286)
    ["label"]=>
    string(10) "teddy bear"
  }
}
CodeWithKyrian commented 2 months ago

First of all, you don’t need to create a new ImageFeatureExtractor just to modify the default shortest_edge for the feature extractor. It looks like you found the size property of ImageFeatureExtractor is protected, which prevented direct modification. Consequently, you created a new instance to override the default instance from the preprocessor_config.json file. However, this approach has a downside: you lose all other configurations from the original file, which might affect the inference performance.

To modify the settings while preserving the original configuration, you can either edit the local preprocessor_config.json file directly or pass the configuration array like this at runtime:

$processor = AutoProcessor::fromPretrained('Xenova/yolov9-c_all', [
  "do_normalize" => false,
  "do_pad" => false,
  "do_rescale" => true,
  "do_resize" => true,
  "feature_extractor_type" => "ImageFeatureExtractor",
  "resample" => 2,
  "rescale_factor" => 0.00392156862745098,
  "size" => [
    "shortest_edge" => 256, // It's 224 in the default config
  ],
  "size_divisibility" => 32
]);

This way, all other configurations are preserved.

In hindsight, it would have been more convenient if I had made the properties of the FeatureExtractor public to allow easier adjustments without requiring a custom config array or changes to the preprocessor_config.json file. Maybe in a future update I guess.

Regarding the Unknown model class for model type yolov9. warning: this occurs because yolov9 has not been added to the list of supported models yet. Nevertheless, since its architecture is quite similar to yolos (which is supported), the inference should still run correctly. You can safely ignore this warning until the next update.

Lastly, to fast-track the official support for yolov9, please open a new issue and select “Feature Request.” This will help prioritize the addition of yolov9 to the list of supported models.

k99k5 commented 2 months ago

Thanks for the response, I will give it a try.

CodeWithKyrian commented 2 months ago

That's not all though.

CodeWithKyrian commented 2 months ago

Your output earlier might not be exactly the values you need, depending on how you plan to use them. In the example usage by Xenova, it's using CSS to render those boxes. Thus, it made sense to use an absolutely positioned box with a border, setting the left, right, width, and height in percentages. If you plan on returning the values from your PHP script to be rendered by CSS, then that's fine.

However, if you plan to draw the detection boxes in PHP and save the image, it might not be the best format to leave it as. The Image class in the package has utilities for drawing rectangles and text, and it accepts the coordinates of the image: xMin, xMax, yMin, and yMax, similar to the output from the model. PHP's imagerectangle has the same inputs. Here’s an example using the Image utility of TransformersPHP:

$processor = AutoProcessor::fromPretrained('Xenova/yolov9-c_all');
$model = AutoModel::fromPretrained('Xenova/yolov9-c_all');

$image = Image::read(__DIR__.'/../images/cats.jpg');
$inputs = $processor($image);
['outputs' => $outputs] = $model($inputs);

[$w, $h] = $inputs['reshaped_input_sizes'][0];

$boxes = array_map(function ($args) use ($inputs, $model): ?array {
    [$xmin, $ymin, $xmax, $ymax, $score, $id] = $args;

    if ($score < 0.25) return null;

    return [
        'xmin' => $xmin,
        'ymin' => $ymin,
        'xmax' => $xmax,
        'ymax' => $ymax,
        'score' => $score,
        'label' => $model->config['id2label'][$id] ?? 'unknown',
    ];
}, $outputs->toArray());

$boxes = array_filter($boxes);

foreach ($boxes as $box) {
    $image->drawRectangle(
        xMin: (int)$box['xmin'],
        yMin: (int)$box['ymin'],
        xMax: (int)$box['xmax'],
        yMax: (int)$box['ymax'],
        color: '0099FF',
        thickness: 2
    );
    $image->drawText(
        text: $box['label'],
        xPos: (int)$box['xmin'],
        yPos: (int)max($box['ymin'] - 5, 0),
        fontFile: '/Users/Kyrian/Library/Fonts/JosefinSans-Bold.ttf',
        fontSize: 14,
        color: '0099FF'
    );
}

$image->save(__DIR__.'/../images/cats-detection.jpg');

Also, I have no idea why the Js version reversed the array for the reshaped_input_sizes but that's not neccesary here, since values are in the right order already - [width, height].

I hope this helps.