aws / aws-sdk-php

Official repository of the AWS SDK for PHP (@awsforphp)
http://aws.amazon.com/sdkforphp
Apache License 2.0
6.04k stars 1.23k forks source link

BedrockAgentClient::startIngestionJob results in "Invalid timestamp value passed to DateTimeResult::fromTimestamp" #2967

Closed Saeven closed 3 months ago

Saeven commented 4 months ago

Describe the bug

BedrockAgentClient::startIngestionJob results in "Invalid timestamp value passed to DateTimeResult::fromTimestamp"

Expected Behavior

That the SDK not crash with a date parsing exception

Current Behavior

#0 /vendor/aws/aws-sdk-php/src/Api/Parser/JsonParser.php(60): Aws\Api\DateTimeResult::fromTimestamp('2024-07-25T20:5...', 'iso8601')
#1 /vendor/aws/aws-sdk-php/src/Api/Parser/JsonParser.php(27): Aws\Api\Parser\JsonParser->parse(Object(Aws\Api\TimestampShape), '2024-07-25T20:5...')
#2 /vendor/aws/aws-sdk-php/src/Api/Parser/JsonParser.php(27): Aws\Api\Parser\JsonParser->parse(Object(Aws\Api\StructureShape), Array)
#3 /vendor/aws/aws-sdk-php/src/Api/Parser/RestJsonParser.php(34): Aws\Api\Parser\JsonParser->parse(Object(Aws\Api\StructureShape), Array)
#4 /vendor/aws/aws-sdk-php/src/Api/Parser/AbstractRestParser.php(63): Aws\Api\Parser\RestJsonParser->payload(Object(GuzzleHttp\Psr7\Response), Object(Aws\Api\StructureShape), Array)
#5 /vendor/aws/aws-sdk-php/src/WrappedHttpHandler.php(126): Aws\Api\Parser\AbstractRestParser->__invoke(Object(Aws\Command), Object(GuzzleHttp\Psr7\Response))
#6 /vendor/aws/aws-sdk-php/src/WrappedHttpHandler.php(93): Aws\WrappedHttpHandler->parseResponse(Object(Aws\Command), Object(GuzzleHttp\Psr7\Request), Object(GuzzleHttp\Psr7\Response), Array)
#7 /vendor/guzzlehttp/promises/src/Promise.php(204): Aws\WrappedHttpHandler->Aws\{closure}(Object(GuzzleHttp\Psr7\Response))
#8 /vendor/guzzlehttp/promises/src/Promise.php(153): GuzzleHttp\Promise\Promise::callHandler(1, Object(GuzzleHttp\Psr7\Response), NULL)
#9 /vendor/guzzlehttp/promises/src/TaskQueue.php(48): GuzzleHttp\Promise\Promise::GuzzleHttp\Promise\{closure}()
#10 /vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(118): GuzzleHttp\Promise\TaskQueue->run()
#11 /vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(145): GuzzleHttp\Handler\CurlMultiHandler->tick()
#12 /vendor/guzzlehttp/promises/src/Promise.php(248): GuzzleHttp\Handler\CurlMultiHandler->execute(true)
#13 /vendor/guzzlehttp/promises/src/Promise.php(224): GuzzleHttp\Promise\Promise->invokeWaitFn()
#14 /vendor/guzzlehttp/promises/src/Promise.php(269): GuzzleHttp\Promise\Promise->waitIfPending()
#15 /vendor/guzzlehttp/promises/src/Promise.php(226): GuzzleHttp\Promise\Promise->invokeWaitList()
#16 /vendor/guzzlehttp/promises/src/Promise.php(269): GuzzleHttp\Promise\Promise->waitIfPending()
#17 /vendor/guzzlehttp/promises/src/Promise.php(226): GuzzleHttp\Promise\Promise->invokeWaitList()
#18 /vendor/guzzlehttp/promises/src/Promise.php(62): GuzzleHttp\Promise\Promise->waitIfPending()

Reproduction Steps

$bedrockAgentClient->startIngestionJob([
    'dataSourceId' => $dataSourceId,
    'knowledgeBaseId' => $knowledgebaseId,
]);

Possible Solution

Looks like a bad format of the date stamp is being pushed through your Date parser.

Additional Information/Context

While invoking StartIngestionJob - the command succeeds, but an exception is thrown with the stack trace noted above.

Examining key differences with breakpoints at the affected date parser, we see that successful calls return:

Screenshot 2024-07-25 at 4 54 24 PM

Whereas this failed call returns:

Screenshot 2024-07-25 at 4 55 24 PM

(I've tested other calls, S3, Polly, etc. which all succeed without exceptions).

It's important to note that the call actually triggers the process on the service side. The problem lies in the composer package, which throws an exception in the codebase when the response is being parsed.

SDK version used

3.316.8

Environment details (Version of PHP (php -v)? OS name and version, etc.)

PHP 8.3.9 (cli) (built: Jul 2 2024 14:10:14) (NTS) Copyright (c) The PHP Group Zend Engine v4.3.9, Copyright (c) Zend Technologies with Zend OPcache v8.3.9, Copyright (c), by Zend Technologies

Saeven commented 4 months ago

This issue also affects getIngestionJob - I imagine it may affect all BedrockClientAgent tasks.

Saeven commented 4 months ago

@yenfryherrerafeliz Thanks for investigating/fixing! Let me know if you need any extra data!

stobrien89 commented 4 months ago

Hi all,

Apologies for the delay here. I suspect this may be an issue with the timestamp values Bedrock is sending, but I've tested both startIngestionJob and getIngestionJob and there were no issues parsing the iso8601 timestamp values sent by the service. I even manually changed the $value on responses to 2024-07-25T20:55:08.816730762Z (like @Saeven posted) and they were parsed successfully.

For those still affected, would it be possible to post debug logs from a call with a response parsing error? To enable debug logging, set debug to true in the client constructor and please be sure to obscure any sensitive information (such as account numbers) before posting. Thanks!

Saeven commented 4 months ago

@stobrien89 No sweat - I know that sometimes these things aren't easy to debug. The crux of it is.

  1. The Service returns this JSON
    {
    "ingestionJob": {
    "dataSourceId": "OBFUSCATED",
    "ingestionJobId": "OBFUSCATED",
    "knowledgeBaseId": "OBFUSCATED",
    "startedAt": "2024-07-31T18:53:00.544453478Z",
    "statistics": {
      "numberOfDocumentsDeleted": 0,
      "numberOfDocumentsFailed": 0,
      "numberOfDocumentsScanned": 0,
      "numberOfMetadataDocumentsModified": 0,
      "numberOfMetadataDocumentsScanned": 0,
      "numberOfModifiedDocumentsIndexed": 0,
      "numberOfNewDocumentsIndexed": 0
    },
    "status": "STARTING",
    "updatedAt": "2024-07-31T18:53:00.544453478Z"
    }
    }

This bubbles down to this small stack:

Screenshot 2024-07-31 at 3 10 20 PM

Which in the end, attempts to call this (which reproduces the problem in one line):

new \Aws\Api\DateTimeResult('2024-07-31T19:05:47.623768151Z');

Calling this directly yields a slightly more informational error, which is otherwise masked by the way the parser is constructed, (specifically in this block):

        try {
            if ($expectedFormat == 'iso8601') {
                try {
                    return self::fromISO8601($timestamp);
                } catch (Exception $exception) {
                    return self::fromEpoch($timestamp); // exception gets trapped, and a less informational one gets thrown here
                }
            } else if ($expectedFormat == 'unixTimestamp') {

Looks like the true exception message is "The timezone could not be found in the database".

Reproducible here: https://3v4l.org/sr3Bv#v7.4.33

I apologize a bit for the confusion about environment version; our prod box is running a diff environment (7.4). I didn't think too much of it since composer install succeeds on both environments (req >= 7.2.5).

Playing with 3v4l, it looks like parsing this DateFormat fails on 8.0.9: https://3v4l.org/sr3Bv#v8.0.9

But succeeds with 8.0.10: https://3v4l.org/sr3Bv#v8.0.10

The changelog for 8.0.10, https://www.php.net/ChangeLog-8.php#8.0.10, notes a compensation for ((DateTimeZone accepting invalid UTC timezones).

Now interestingly - if you reduce to 6 decimal positions, it will parse on all versions:

Valid: https://3v4l.org/sjeLa#v7.4.33

echo (new \DateTime('2024-07-31T19:16:58.543693Z'))->format('Y-m-d H:i:s');

Invalid:

echo (new \DateTime('2024-07-31T19:16:58.623768151Z'))->format('Y-m-d H:i:s');

So you'd perhaps want to modify the service to output only 6 decimal positions, or, compensate in the parser, or, perhaps make the service return a regular old linux TS to address the issue.

Hope this helps! Alex

stobrien89 commented 4 months ago

Thanks @Saeven,

That's extremely helpful. Seems like we might want to do a version compare and reduce decimal positions on the affected ranges, but I'll ask around to see if there's broad enough impact to warrant a change at the service level. This seems to be PHP-specific, in which case we'll likely get a fix out tomorrow or Friday, but I'll keep you posted.

Saeven commented 4 months ago

My pleasure. From years of working in Java, the standard there is six decimal places at most. The RFC doesn't place a limit on the precision, but until calling this service, I've never seen one with more than six. I suppose that the folks at PHP just follow a standard even though the RFC would contend that what you're doing is OK.

That's a hell of a precision though! 🤣. With clock drift and the time it took to crunch those electrons, I wouldn't trust those tail digits too much!

Thanks for the fix, look forward to it!

stobrien89 commented 4 months ago

@Saeven,

Just posted #2974, which should resolve this for anywhere from 7 to 9 fractional seconds. It seems like most languages have native support for nanosecond precision and this is the first time (as far as I know) we're seeing this coming from an AWS service. Not sure if we'll ever see picoseconds, but that's a bridge we'll cross when/if we get to it 😃 .

Saeven commented 4 months ago

ROFL! Hopefully we're retired by then! 🍻

Thanks for the patch, eagerly look forward to the composer package increment!

github-actions[bot] commented 3 months ago

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

stobrien89 commented 3 months ago

Any time! Thanks for your patience. This should be live in today's release around 2:15PM EDT