CodeWithKyrian / transformers-php

Transformers PHP is a toolkit for PHP developers to add machine learning magic to their projects easily.
https://codewithkyrian.github.io/transformers-php/
Apache License 2.0
553 stars 29 forks source link

Problems Trying to use with Laravel on AWS #50

Open coogle opened 3 months ago

coogle commented 3 months ago

System Info

Ubuntu 22.02 on AWS running on a m5.4xlarge instance. The code is running in the context of a Laravel application (specifically a mixin for Illuminate\Support\Str), being tested via artisan tinker

PHP Version

8.2

Environment/Platform

Description

I am trying to do some simple usage with Laravel on AWS via artisan tinker and I am running into a problem where the first execution works, but then the second execution hangs. Here is the simple code I'm executing (technically I'm executing it via a mixin to Str and running it in artisan tinker like so:

$ ./artisan tinker

> \Illuminate\Support\Str::namedEntities('What kind of campgrounds are near New York City?');
= [
    [
      "entity_group" => "LOC",
      "score" => 0.99945451815923,
      "word" => "New York City",
      "start" => null,
      "end" => null,
    ],
  ]
>

where namedEntities() just executes the following:

class PredictNamedEntities
{
    public function namedEntities(): callable
    {
        return function (
            string $input,
            AggregationStrategy $aggregationStrategy = AggregationStrategy::MAX,
            string $model = 'Xenova/bert-base-NER',
        ): array {

            $ner = pipeline(
                Task::Ner,
                $model,
            );

            return array_values(
                $ner(
                    $input,
                    aggregationStrategy: $aggregationStrategy
                )
            );
        };
    }
}

I put print statements between the pipeline and the $ner command to see where it was freezing up and it's happening when I call $ner(). Like I said this is only on the 2nd execution -- the first execution works perfectly as expected.

I have tried multiple different types of instances (I was actually trying to benchmark what kind of instance sizes I might need to use this library in a production setting) and it's the same behavior across them all (including a m5.4xlarge with 16 CPUs and 64G of RAM available). All of them freeze up like this.

I also execute this exact same type of command from artisan tinker very often on my Macbook M1 and it works fine, so this is something specifically happening at least in AWS or perhaps Intel architecture vs. Apple Silicon.

This sort of feels like an FFI issue on the surface, especially since it works perfectly (and fast) the very first call but then the second call stalls out... perhaps you have a different view?

Reproduction

  1. Create an instance in AWS
  2. Attempt to run a named entity recognition via artisan tinker
  3. Get a resullt back
  4. Replay the exact same line of code a 2nd time
  5. Hangs
coogle commented 3 months ago

After more testing....

This seems to only be happening with artisan tinker on AWS. If I write a pure PHP script I am not experiencing the issue:

<?php

require "./vendor/autoload.php";

use Codewithkyrian\Transformers\Generation\AggregationStrategy;
use function Codewithkyrian\Transformers\Pipelines\pipeline;
use Codewithkyrian\Transformers\Pipelines\Task;

function test(
string $input,
            AggregationStrategy $aggregationStrategy = AggregationStrategy::MAX,
            string $model = 'Xenova/bert-base-NER',
)
{
            $ner = pipeline(
                Task::Ner,
                $model,
            );

                        return array_values(
                $ner(
                    $input,
                    aggregationStrategy: $aggregationStrategy
                )
            );

}

var_dump(test("Show me campgrounds near New York City"));
var_dump(test("Show me campgrounds near Chicago, IL"));
coogle commented 3 months ago

I can also confirm that this isn't because of some weirdness with the mixin ... like I said, my best bet here is there is some sort of contention here with the way artisan tinker works to provide an interactive PHP prompt on AWS / Intel / Apple Silicon / Ubuntu, vs. straight code execution using just the php CLI command, etc.

Hopefully you have some insights on how your implementation might vary in different execution environments.

CodeWithKyrian commented 2 months ago

Can you try again with this latest release and revert your observations?

coogle commented 2 months ago

I can, but I've moved on to other things at this point. I suggest we leave this bug open until I can come back to it (feel free to poke me here if it's been too long)

CodeWithKyrian commented 2 months ago

Alright, that works!