DeepLcom / deepl-php

Official PHP library for the DeepL language translation API.
MIT License
202 stars 23 forks source link

Allow to configure connect timeout separately #22

Closed VincentLanglet closed 1 year ago

VincentLanglet commented 1 year ago

Hi @daniel-jones-deepl,

Currently, Deepl only allow one option: TranslatorOptions::TIMEOUT. This one is used for both

But we might accept a CURLOPT_TIMEOUT_MS of 30s while we want a low CURLOPT_CONNECTTIMEOUT. In general it doesn't make a lot of sens to have more than a few seconds for the connect timeout.

So it should be better to split both option

To keep the old behavior, I kept the TranslatorOptions::TIMEOUT option which is used as a default value for both CONNECT_TIMEOUT and EXEC_TIMEOUT

Also, the doc was saying

timeout: the number of seconds used as connection timeout for each HTTP request retry. The default value is 10.0 (10 seconds).

So I changed it to connect_timeout. (Even if timeout will still work for the BC compatibility).

I'm open to any feedback :)

VincentLanglet commented 1 year ago

Friendly ping @daniel-jones-deepl, @JanEbbing. Does someone have time to take a look ? :)

JanEbbing commented 1 year ago

Hi @VincentLanglet , sorry, Daniel is out at the moment. For context, we get several requests to configure certain HTTP parameters/behaviours - if we add them all as a member variable of TranslatorOption, I feel things are going to get unwieldy. Hence, I'd rather expose the HttpClient.

I see 2 broad ways to achieve this.

  1. Allow users to configure the $curlOptions variable in HttpClient (we would ignore/overwrite some values, e.g. URL, or anything that is set explicitly as an option, like proxy).
  2. Exposing the entire HttpClient class and allowing users to pass their own HttpClient into Translator. We would still provide the current client as the default and for users to subclass to change certain behaviour (e.g. when to retry, changing the backoff algorithm, etc).

My asks:

VincentLanglet commented 1 year ago

Hi @VincentLanglet , sorry, Daniel is out at the moment. For context, we get several requests to configure certain HTTP parameters/behaviours - if we add them all as a member variable of TranslatorOption, I feel things are going to get unwieldy. Hence, I'd rather expose the HttpClient.

Yes, I understand. I tried to introduce as few changes as possible:

I see 2 broad ways to achieve this.

  1. Allow users to configure the $curlOptions variable in HttpClient (we would ignore/overwrite some values, e.g. URL, or anything that is set explicitly as an option, like proxy).
  2. Exposing the entire HttpClient class and allowing users to pass their own HttpClient into Translator. We would still provide the current client as the default and for users to subclass to change certain behaviour (e.g. when to retry, changing the backoff algorithm, etc).

My asks:

  • Do both of these approaches cover your use case for this PR?
  • Do you foresee anything you would want to do that would not be possible with 1) or 2) or both? (e.g. changing the backoff algorithm isn't possible with 1, but I don't think that's a big concern - maybe something like "I want to have multiple translator objects with a common HTTP Client so I can rate-limit how many outgoing requests I make" is something users would run into?)
  • Any comments/reasons you'd prefer one over the other?

For context, I currently use the following code (with this PR)

$client = new Translator($apiKey, [
     TranslatorOptions::DEFAULT_MAX_RETRIES => 0,
     TranslatorOptions::CONNECT_TIMEOUT => 5,
     TranslatorOptions::EXEC_TIMEOUT => 30,
]);
$translations = $client->translateText($payload, $languageFrom, $languageTo, []);
$languages = $client->getTargetLanguages();

For the "better" solution, I would say it depends on the amount of time/work/changes you want to take @JanEbbing.

Allowing to pass the HttpClient to the Translator seems to be the more flexible solution (with a default http implementation) ; but to me it's also the solution which might require the more change in the Deepl codebase.

Even if the HttpClient is now an entry-point, it still could be helpful to allow people to pass some curl options to the HttpClient, in order to not having to rewrite a whole HttpClient if they want to change the timeout. So to me the solutions 1 and 2 are not incompatible. You could have a

class HttpClientFactory
{
     public function static create($authKey, $options)
     {
          // Logic which are inside Translator::construct
     }
}

And people could write either

new Translator(HttpClientFactory::create($authKey, $options));
new Translator($customHttpClient);

Still, this could require a way to override curlOptions with the factory or construct of the HttpClient if I just want to keep all the logic from Deepl except one specific curl options (like the connect timeout).

Something like

class HttpClient
{
     public function __construct($url, ..., $defaultCurlOptions) {
         ...
         $this->defaultCurlOptions = $defaultCurlOptions;
     }

     public function sendRequestWithBackOff(...) {
         ...
         while ($backoff->getNumRetries() <= $this->maxRetries) {
              $response = $this->sendRequest(
                    $method,
                    $url,
                    $backoff->getTimeUntilDeadline(),
                    $headers,
                    $params,
                    $file,
                    $outFile
                );
                ...
         }
     }

     private function sendRequest(..., $timeUntilDeadline) {
         $curlOptions = $this->defaultCurlOptions;
         if (isset($curlOptions[\CURLOPT_CONNECTTIMEOUT_MS]) {
             $curlOptions[\CURLOPT_CONNECTTIMEOUT_MS] = max($curlOptions[\CURLOPT_CONNECTTIMEOUT_MS], $timeUntilDeadline * 1000));
         } else {
             $curlOptions[\CURLOPT_CONNECTTIMEOUT] = max($curlOptions[\CURLOPT_CONNECTTIMEOUT], $timeUntilDeadline));
         }
         if (isset($curlOptions[\CURLOPT_TIMEOUT_MS]) {
             $curlOptions[\CURLOPT_TIMEOUT_MS] = max($curlOptions[\CURLOPT_TIMEOUT_MS], $timeUntilDeadline * 1000));
         } else {
             $curlOptions[\CURLOPT_TIMEOUT] = max($curlOptions[\CURLOPT_TIMEOUT], $timeUntilDeadline));
         }
     }
}

Does it answer all your questions ? Do you plan to implements such change alone/with your team or should/can I help ?

VincentLanglet commented 1 year ago

Hi @JanEbbing, is there anything to do on my side ?

I would have expected a short answer to know

JanEbbing commented 1 year ago

Hi Vincent, sorry for the lack of transparency here. We'll ultimately move all development to GitHub, then it's easier to see what we're working on. For your questions, I'm working on allowing users to configure which HTTP Client is used, using either a PSR-18 compliant one or the symfony client. So, nothing expected from your side.

VincentLanglet commented 1 year ago

Hi Vincent, sorry for the lack of transparency here. We'll ultimately move all development to GitHub, then it's easier to see what we're working on. For your questions, I'm working on allowing users to configure which HTTP Client is used, using either a PSR-18 compliant one or the symfony client. So, nothing expected from your side.

Thanks for your quick answer, I'm impatient to try this feature (and stop using a temporary fork). Feel free to ping me if you need review/beta-testing before an official release :)

JanEbbing commented 1 year ago

Hi @VincentLanglet , I added support for custom clients in a new branch. We plan to merge this in the next version, please let me know what you think.

VincentLanglet commented 1 year ago

Hi @VincentLanglet , I added support for custom clients in a new branch. We plan to merge this in the next version, please let me know what you think.

Hi, this seems an ok implementation. It should indeed solve our issue, allowing us to configure timeout by ourself.

The only things I see which might be improve (but I'm not certain how), would be to remove the hard dependency to guzzlehttp/psr7. https://github.com/DeepLcom/deepl-php/commit/338c12362728dbc0ab24c7f0a6a1011f9b7c164b#diff-d2ab9925cad7eac58e0ff4cc0d251a937ecf49e4b6bf57f8b95aab76648a9d34R15

I saw @nicolas-grekas promoting https://github.com/php-http/discovery which does an auto-discovery of the right psr implementation. The goal would be to use

but I'm not sure how easy it is to use this ; and this can still be done in another PR if needed.

JanEbbing commented 1 year ago

Thanks, that seems like a great way to avoid a dependency on a PSR-7 implementation!

daniel-jones-deepl commented 1 year ago

Another idea: We could also require a PSR-7/PSR-17 HTTP factory to be passed in with the PSR-18 HTTP client. This would mean no direct dependencies(*), but it would make the library interface a little harder to use.

*: edited-to-add: actually there would need to be a psr/http-factory dependency.

nicolas-grekas commented 1 year ago

I would definitely recommend relying on php-http/discovery and using its Psr18Client. It's an adapter that knows how to use whatever PSR-17+PSR-18 implementations are available.

As far as timeout is concerned, this could be considered out of scope from an SDK pov (aka let users that care do the timeout configuration as part of their wiring) or we could also discuss a way to make Psr18Client configurable somehow.

You might want to check https://github.com/helpscout/helpscout-api-php/pull/311 for inspiration.

JanEbbing commented 1 year ago

@nicolas-grekas Thanks for the code example. We use POST requests with multipart/form-data. Based on the docs I would use this, but using it in the code the StreamFactoryDiscovery::find() is marked as deprecated and to be removed in 2.0 (actually the entire class).

Is there any way around using that?

nicolas-grekas commented 1 year ago

Instead of this discovery class, you should use https://github.com/php-http/multipart-stream-builder and use the Psr18Client as a stream factory (it does implement what's needed):

$client = new Psr18Client();
$builder = new MultipartStreamBuilder($client);
JanEbbing commented 1 year ago

Thanks! I implemented that change and removed the guzzle dependency on the branch.

VincentLanglet commented 1 year ago

Thanks! I implemented that change and removed the guzzle dependency on the branch.

Hi @JanEbbing, I tried your branch (with the recent changes) on my project and it works fine. Thanks :)

VincentLanglet commented 1 year ago

Hi @JanEbbing, what are the next step for your branch ? Is it ready to be reviewed/merged/released ? :)

JanEbbing commented 1 year ago

Yes, this (with some other fixes) will be released today.

VincentLanglet commented 1 year ago

Thanks