aws / aws-sdk-php

Official repository of the AWS SDK for PHP (@awsforphp)
http://aws.amazon.com/sdkforphp
Apache License 2.0
6.04k stars 1.23k forks source link

Intermittently getting EndpointDiscoveryException's thrown when attempting to write records to timestream. #2940

Open brysn opened 5 months ago

brysn commented 5 months ago

Describe the bug

I have a webhook endpoint that consumes messages from a 3rd party and writes records into AWS Timestream. It does so using the TimestreamWrite/TimestreamWriteClient::writeRecords method. There are long periods of time in which it works, but intermittently it will start throwing exceptions. When this starts to happen, it seems that ~1/3 of requests fail with this exception, while 2/3 of the requests continue to work.

Here are the exception details:

Property Value
class Aws\Exception\AwsException
awsErrorCode EndpointDiscoveryException
awsErrorMessage The endpoint required for this service is currently unable to be retrieved, and your request can not be fulfilled unless you manually specify an endpoint.
previous exception Error executing \"DescribeEndpoints\" on \"https://ingest.timestream.us-west-2.amazonaws.com"; AWS HTTP error: cURL error 6: getaddrinfo() thread failed to start (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://ingest.timestream.us-west-2.amazonaws.com

Expected Behavior

According to https://curl.se/libcurl/c/libcurl-errors.html the curl error means: Could not resolve host. The given remote host was not resolved.

I expected the records to be written to timestream.

Current Behavior

It appears that in some intermittent cases that the endpoint discovery call doesn't work and the hostname https://ingest.timestream.us-west-2.amazonaws.com cannot be resolved.

Reproduction Steps

<?php

use Aws\Credentials\Credentials;
use Aws\Exception\AwsException;
use Aws\Result;
use Aws\TimestreamWrite\Exception\TimestreamWriteException;
use Aws\TimestreamWrite\TimestreamWriteClient;

$awsKey = '{{ INSERT AWS KEY }}';
$awsSecret = '{{ INSERT AWS SECRET }}';
$awsTimestreamDatabase = '{{ INSERT TIMESTREAM DB }}';
$awsTimestreamTable = '{{ INSERT TIMESTREAM TABLE }}';
$awsTimestreamRecords = []; //INSERT RECORDS TO BE INSERTED

$client = new TimestreamWriteClient([
    'version' => 'latest',
    'region' => 'us-west-2',
    'credentials' => new Credentials($awsKey, $awsSecret),
]);

// This is what throws the exception intermittently
$response = $client->writeRecords([
    'DatabaseName' => $awsTimestreamDatabase,
    'Records' => $awsTimestreamRecords,
    'TableName' => $awsTimestreamTable,
]);

Possible Solution

I'm guessing there must be some issue with the curl setup here as I doubt that endpoint on aws is having that many issues.

Additional Information/Context

This happens to hundreds, if not thousands of times per day. It does seem to occur when there are more requests being sent at the same time. For example, overnight there are much fewer requests and everything seems to work fine. But during the day when it is getting flooded with requests then about 1/3 of them fail with this exception.

SDK version used

3.311.0

Environment details (Version of PHP (php -v)? OS name and version, etc.)

PHP 8.3.7, Ubuntu 22.04

yenfryherrerafeliz commented 5 months ago

Hi @brysn, sorry to hear about your issues. This issue is hard to reproduce seems it happens sporadically. I tried to do it locally but everything works fine at my end. Can you please confirm if is there any proxy in the middle?, or any other network limitations done by a firewall conf maybe?.

Thanks!

github-actions[bot] commented 4 months ago

This issue has not recieved a response in 1 week. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

brysn commented 4 months ago

Hi @yenfryherrerafeliz , thanks for looking into it. The application is on Heroku; there is no firewall, proxy in the middle, or any other network limitations.

It seems like it has to do with the endpoint discovery caching. Could the endpoint be changing on AWS before the cache expires for the endpoint?

RanVaknin commented 3 months ago

Hi @brysn ,

AFAIK the SDK does not cache any endpoints. The SDK uses Guzzle, and PHP's standard curl client to make network calls. DNS resolution happens at the OS level and is not something that the SDK has visibility / control over.

If you want to root cause this, you might want to use a network diagnostics tool like Wireshark to inspect what kind of networking events are happening your machine. As a temporary measure, you might want to lower / completely disable the TTL on your the DNS cache to see if it solves your issue.

This doesn't seem directly related to the SDK, but we will keep the issue open to see if this helps and maybe provide some other possible guidance.

Thanks, Ran~

brysn commented 3 months ago

@RanVaknin

Thanks for looking into this. The issue is with endpoint discovery and the SDK does indeed cache endpoints: Example

It's unlikely that AWS itself has an endpoint discovery bug. It's more likely that it's an issue with the SDK's endpoint discovery caching.

brysn commented 2 months ago

@RanVaknin since this is a bug with the SDK, can this be looked into further?

brysn commented 1 month ago

@RanVaknin @yenfryherrerafeliz Was there more information I needed to provide or why was this marked as "guidance" when it's a bug with the sdk caching?