CodeWithKyrian / transformers-php

Transformers PHP is a toolkit for PHP developers to add machine learning magic to their projects easily.
https://codewithkyrian.github.io/transformers-php/
Apache License 2.0
281 stars 16 forks source link

What are the objectives of this library? #34

Open k00ni opened 1 month ago

k00ni commented 1 month ago

Your question

What are the objectives of this library?

Its stated in the README.md: Because TransformersPHP is designed to be functionally equivalent to the Python library, it's super easy to learn from existing Python or Javascript code. Does that mean that you aim for a 99% coverage of the HuggingFace Python library? Or in other words: are you re-implementing the whole HuggingFace Python library in PHP?

Context (optional)

I recently started using HuggingFace and was surprised, that the PHP-support is so bad. Luckily, I found your library (saw your comment in HuggingFace forum).

I am wondering why there isn't a basic Python wrapper in PHP at least? My local tests show that its quiet possible to generate Python code on the fly for basic function calls at least. Of course, custom Python code would be hard to realize, but for common functionality its quiet doable.

The following script requires a working PHP 8 and Python 3 environment (with datasets, evaluate, torch, transformers[sentencepiece] installed via pip). Its very basic, but can read a PDF file using a third party lib and uses some Python-code to generate a summary of the PDF content.

use Smalot\PdfParser\Parser;

require __DIR__.'/vendor/autoload.php';

// get all text from a given PDF file
$parser = new Parser();
$document = $parser->parseFile('test.pdf');
$text = $document->getText();

// represents Python code (each entry is a line)
$pythonCode = [
    'from transformers import pipeline',
    'summarizer = pipeline("summarization", "sshleifer/distilbart-cnn-12-6")',
    'result = summarizer(',
    '"""',
    $text,
    '"""',
    ')',
    'print(result)',
];

$pythonFile = __DIR__.'/generated_file.py';
if (file_exists($pythonFile)) {
    unlink($pythonFile);
}

// write custom Python code to file for later execution
$pythonCode = implode(PHP_EOL, $pythonCode);
file_put_contents($pythonFile, $pythonCode);

// execute the Python file and save its output into $output
ob_start();
$output = shell_exec('python3 generated_file.py &2> /dev/null');
ob_end_clean();

var_dump($output);
// outputs [{'summary_text': '...

// get result as array
$resultAsArray = json_decode($output, true);

// process the result ...

Reference (optional)

No response

CodeWithKyrian commented 1 month ago

Hey @k00ni,

Thanks for reaching out and bringing up your thoughts on TransformersPHP. I appreciate the opportunity to shed some light on the objectives and rationale behind the library.

First off, you're absolutely right on your first statement - the primary aim of TransformersPHP is indeed to mirror the functionality of the renowned Hugging Face Python library as closely as possible. However, the approach isn't about merely wrapping the Python implementation in PHP. No, the goal is to execute these tasks natively within PHP itself.

TransformersPHP aims to provide native support for ML tasks within the PHP ecosystem. Now, this doesn't imply that every single operation happens purely in PHP. I leverage FFI (Foreign Function Interface) to interact with C libraries for certain tensor operations and running the actual model (just like Python does). But the user experience, the interface, the feel of using TransformersPHP, is indeed native to PHP.

Your example script demonstrates a workaround by generating Python code on the fly for basic function calls and then executing it. While wrapping Python functionality in PHP might seem like a shortcut, it comes with its own set of limitations and complexities. Control over crucial aspects like cache directory, direct model usage, tokenization, post processing, state management, and even performance optimization becomes a challenge. Sure, for simpler tasks like using pipelines, the approach might suffice. But when it comes to more customized usage, the intricacies multiply, making it less viable.

Moreover, relying on a Python environment introduces additional overhead and complexity, which might not resonate well with many PHP developers. Setting up and maintaining a Python environment is a task not everyone is keen on (I'm number 1 on the list😅), especially when they're deeply entrenched in the PHP ecosystem. TransformersPHP, on the other hand, offers a seamless experience. With just a composer install, you're set to use the library within the PHP environment we're all familiar with.

I hope this sheds some light on the philosophy behind TransformersPHP. Your feedback and questions are always welcome as they help shape the direction of the project. If you have any further queries or suggestions, feel free to reach out anytime.

Cheers!