aws / aws-sdk-php

Official repository of the AWS SDK for PHP (@awsforphp)
http://aws.amazon.com/sdkforphp
Apache License 2.0
6.02k stars 1.22k forks source link

SerializationException #1510

Closed r3oath closed 6 years ago

r3oath commented 6 years ago

I'm trying to make use of the BatchDetectSentiment API, however I am constantly getting errors regularly when making requests (so some requests are going through fine). There seems to be no logical correlation between what I am sending and when the error occurs. In the BatchDetectSentiment example, I am simply sending plain-text sentences. I've even gone as far as to remove all non (a-zA-Z0-1) characters to limit the chance that it isn't handling emojis (these sentences being processed come from the facebook API). I'm also sending no more than 5 documents per batch request, and limiting each document to a maximum of 5000 bytes.

Any ideas? See the exception message below:

Error executing "BatchDetectSentiment" on "https://comprehend.eu-west-1.amazonaws.com"; AWS HTTP error: Client error: `POST https://comprehend.eu-west-1.amazonaws.com` resulted in a `400 Bad Request` response:
{"__type":"SerializationException","Message":"Start of structure or map found where not expected."}
 SerializationException (client): Start of structure or map found where not expected. - {"__type":"SerializationException","Message":"Start of structure or map found where not expected."}
kstich commented 6 years ago

Can you please post the code you're using to generate the call to ->batchDetectSentiment()?

r3oath commented 6 years ago

There are a few moving parts, but it basically boils down to

$scores = $this->getSentimentScores(
    $this->client->batchDetectSentiment([
        'LanguageCode' => $this->language,
        'TextList' => $this->processComments($comments),
    ])
);

$this->language returns 'en'. $this->processComments($comments) returns an array of strings, e.g:

["I really love that shirt", "Foo bar is a good company"]

r3oath commented 6 years ago

So I can shed a little more light on this issue. I swapped out the batchDetectSentiment call with detectSentiment, passing the same comment data (albeit one comment at a time) and I'm not getting the described error.

So it seems to be happening with batched requests only. Judging by the response status code and the fact I am passing all data as expected/required, the strings are normalised (detectSentiment works correctly with all of them) – the issue is more than likely happening deeper in the SDK, or perhaps even with Guzzle.

kstich commented 6 years ago

I was able to successfully call the ->batchDetectSentiment() operation as follows:

$client->batchDetectSentiment([
    'LanguageCode' => 'en',
    'TextList' => ["I really love that shirt", "Foo bar is a good company"],
]);

Please make sure the structure for your operation has the correct input data. You can also use the 'debug' flag to help in this process.

r3oath commented 6 years ago

The input I'm passing is per the API specifications: a list of strings. Here's an example list which fails for me, and the results of running the request with debug enabled.

List of strings:

array(4) {
  [5]=>
  string(47) "Speed limits have nothing to do with  tiredness"
  [6]=>
  string(61) "False They generally happen because of draconian speed limits"
  [7]=>
  string(38) "False some drivers start driving tired"
  [8]=>
  string(26) "They forget to take breaks"
}

Debug results:

-> Entering step init, name 'idempotency_auto_fill'
---------------------------------------------------

  command was set to array(3) {
    ["instance"]=>
    string(32) "[CUT]"
    ["name"]=>
    string(20) "BatchDetectSentiment"
    ["params"]=>
    array(3) {
      ["LanguageCode"]=>
      string(2) "en"
      ["TextList"]=>
      array(4) {
        [5]=>
        string(47) "Speed limits have nothing to do with  tiredness"
        [6]=>
        string(61) "False They generally happen because of draconian speed limits"
        [7]=>
        string(38) "False some drivers start driving tired"
        [8]=>
        string(26) "They forget to take breaks"
      }
      ["@http"]=>
      array(1) {
        ["debug"]=>
        resource(493) of type (stream)
      }
    }
  }

  request was set to array(0) {
  }

-> Entering step validate, name 'validation'
--------------------------------------------

  no changes

-> Entering step build, name 'builder'
--------------------------------------

  request.instance was set to [CUT]
  request.method was set to POST
  request.headers was set to array(4) {
    ["X-Amz-Security-Token"]=>
    string(7) "[TOKEN]"
    ["Host"]=>
    array(1) {
      [0]=>
      string(34) "comprehend.eu-west-1.amazonaws.com"
    }
    ["X-Amz-Target"]=>
    array(1) {
      [0]=>
      string(40) "Comprehend_20171127.BatchDetectSentiment"
    }
    ["Content-Type"]=>
    array(1) {
      [0]=>
      string(26) "application/x-amz-json-1.1"
    }
  }

  request.body was set to {"LanguageCode":"en","TextList":{"5":"Speed limits have nothing to do with  tiredness","6":"False They generally happen because of draconian speed limits","7":"False some drivers start driving tired","8":"They forget to take breaks"}}
  request.scheme was set to https

-> Entering step build, name ''
-------------------------------

  request.instance changed from [CUT] to [CUT]
  request.headers.User-Agent was set to array(1) {
    [0]=>
    string(19) "aws-sdk-php/3.52.23"
  }

-> Entering step sign, name 'invocation-id'
-------------------------------------------

  request.instance changed from [CUT] to [CUT]
  request.headers.aws-sdk-invocation-id was set to array(1) {
    [0]=>
    string(32) "[CUT]"
  }

-> Entering step sign, name 'retry'
-----------------------------------

  request.instance changed from [CUT] to [CUT]
  request.headers.aws-sdk-retry was set to array(1) {
    [0]=>
    string(3) "0/0"
  }

-> Entering step sign, name 'signer'
------------------------------------

  request.instance changed from [CUT] to [CUT]
  request.headers.X-Amz-Date was set to array(1) {
    [0]=>
    string(16) "20180315T053842Z"
  }

  request.headers.Authorization was set to array(1) {
    [0]=>
    string(247) "AWS4-HMAC-SHA256 Credential=[KEY]/20180315/eu-west-1/comprehend/aws4_request, SignedHeaders=aws-sdk-invocation-id;aws-sdk-retry;host;x-amz-date;x-amz-target, Signature=[SIGNATURE]
  }

* Rebuilt URL to: https://comprehend.eu-west-1.amazonaws.com/
* Found bundle for host comprehend.eu-west-1.amazonaws.com: 0x7fb2ebe102a0 [can pipeline]
* Re-using existing connection! (#0) with host comprehend.eu-west-1.amazonaws.com
* Connected to comprehend.eu-west-1.amazonaws.com (54.194.137.78) port 443 (#0)
> POST / HTTP/1.1
Host: comprehend.eu-west-1.amazonaws.com
X-Amz-Target: Comprehend_20171127.BatchDetectSentiment
Content-Type: application/x-amz-json-1.1
aws-sdk-invocation-id: [CUT]
aws-sdk-retry: 0/0
X-Amz-Date: 20180315T053842Z
Authorization: AWS4-HMAC-SHA256 Credential=[KEY]/20180315/eu-west-1/comprehend/aws4_request, SignedHeaders=aws-sdk-invocation-id;aws-sdk-retry;host;x-amz-date;x-amz-target, Signature=[SIGNATURE]
User-Agent: aws-sdk-php/3.52.23 GuzzleHttp/6.2.1 curl/7.54.0 PHP/7.1.13
Content-Length: 234

* upload completely sent off: 234 out of 234 bytes
< HTTP/1.1 400 Bad Request
< Date: Thu, 15 Mar 2018 05:38:42 GMT
< Content-Type: application/x-amz-json-1.1
< Content-Length: 99
< Connection: keep-alive
< x-amzn-RequestId: [CUT]
< 
* Connection #0 to host comprehend.eu-west-1.amazonaws.com left intact

<- Leaving step sign, name 'signer'
-----------------------------------

  error was set to array(13) {
    ["instance"]=>
    string(32) "[CUT]"
    ["class"]=>
    string(44) "Aws\Comprehend\Exception\ComprehendException"
    ["message"]=>
    string(497) "Error executing "BatchDetectSentiment" on "https://comprehend.eu-west-1.amazonaws.com"; AWS HTTP error: Client error: `POST https://comprehend.eu-west-1.amazonaws.com` resulted in a `400 Bad Request` response:
  {"__type":"SerializationException","Message":"Start of structure or map found where not expected."}
   SerializationException (client): Start of structure or map found where not expected. - {"__type":"SerializationException","Message":"Start of structure or map found where not expected."}"
    ["file"]=>
    string(109) "[CUT]/vendor/aws/aws-sdk-php/src/WrappedHttpHandler.php"
    ["line"]=>
    int(191)
    ["trace"]=>
    string(7414) "#0 [CUT]/vendor/aws/aws-sdk-php/src/WrappedHttpHandler.php(100): Aws\WrappedHttpHandler->parseError(Array, Object(GuzzleHttp\Psr7\Request), Object(Aws\Command), Array)
  #1 [CUT]/vendor/guzzlehttp/promises/src/Promise.php(203): Aws\WrappedHttpHandler->Aws\{closure}(Array)
  #2 [CUT]/vendor/guzzlehttp/promises/src/Promise.php(174): GuzzleHttp\Promise\Promise::callHandler(2, Array, Array)
  #3 [CUT]/vendor/guzzlehttp/promises/src/RejectedPromise.php(40): GuzzleHttp\Promise\Promise::GuzzleHttp\Promise\{closure}(Array)
  #4 [CUT]/vendor/guzzlehttp/promises/src/TaskQueue.php(47): GuzzleHttp\Promise\RejectedPromise::GuzzleHttp\Promise\{closure}()
  #5 [CUT]/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(96): GuzzleHttp\Promise\TaskQueue->run()
  #6 [CUT]/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php(123): GuzzleHttp\Handler\CurlMultiHandler->tick()
  #7 [CUT]/vendor/guzzlehttp/promises/src/Promise.php(246): GuzzleHttp\Handler\CurlMultiHandler->execute(true)
  #8 [CUT]/vendor/guzzlehttp/promises/src/Promise.php(223): GuzzleHttp\Promise\Promise->invokeWaitFn()
  #9 [CUT]/vendor/guzzlehttp/promises/src/Promise.php(267): GuzzleHttp\Promise\Promise->waitIfPending()
  #10 [CUT]/vendor/guzzlehttp/promises/src/Promise.php(225): GuzzleHttp\Promise\Promise->invokeWaitList()
  #11 [CUT]/vendor/guzzlehttp/promises/src/Promise.php(267): GuzzleHttp\Promise\Promise->waitIfPending()
  #12 [CUT]/vendor/guzzlehttp/promises/src/Promise.php(225): GuzzleHttp\Promise\Promise->invokeWaitList()
  #13 [CUT]/vendor/guzzlehttp/promises/src/Promise.php(62): GuzzleHttp\Promise\Promise->waitIfPending()
  #14 [CUT]/vendor/aws/aws-sdk-php/src/AwsClientTrait.php(58): GuzzleHttp\Promise\Promise->wait()
  #15 [CUT]/vendor/aws/aws-sdk-php/src/AwsClientTrait.php(77): Aws\AwsClient->execute(Object(Aws\Command))
  #16 [CUT]/app/Services/Aws.php(53): Aws\AwsClient->__call('batchDetectSent...', Array)
  #17 [internal function]: App\Services\Aws->App\Services\{closure}(Array, 1)
  #18 [CUT]/vendor/laravel/framework/src/Illuminate/Support/Collection.php(861): array_map(Object(Closure), Array, Array)
  #19 [CUT]/app/Services/Aws.php(56): Illuminate\Support\Collection->map(Object(Closure))
  #20 [CUT]/app/Console/Commands/CalculateSentimentCommand.php(82): App\Services\Aws->getAggregatedSentimentFor(Object(App\Objects\FacebookPost), Object(App\Console\Commands\CalculateSentimentCommand))
  #21 [CUT]/app/Console/Commands/CalculateSentimentCommand.php(49): App\Console\Commands\CalculateSentimentCommand->packageSentimentFor(Object(App\Objects\FacebookPost))
  #22 [internal function]: App\Console\Commands\CalculateSentimentCommand->App\Console\Commands\{closure}(Object(App\Objects\FacebookPost), 2)
  #23 [CUT]/vendor/laravel/framework/src/Illuminate/Support/Collection.php(861): array_map(Object(Closure), Array, Array)
  #24 [CUT]/app/Console/Commands/CalculateSentimentCommand.php(50): Illuminate\Support\Collection->map(Object(Closure))
  #25 [internal function]: App\Console\Commands\CalculateSentimentCommand->handle()
  #26 [CUT]/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(29): call_user_func_array(Array, Array)
  #27 [CUT]/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(87): Illuminate\Container\BoundMethod::Illuminate\Container\{closure}()
  #28 [CUT]/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php(31): Illuminate\Container\BoundMethod::callBoundMethod(Object(Illuminate\Foundation\Application), Array, Object(Closure))
  #29 [CUT]/vendor/laravel/framework/src/Illuminate/Container/Container.php(549): Illuminate\Container\BoundMethod::call(Object(Illuminate\Foundation\Application), Array, Array, NULL)
  #30 [CUT]/vendor/laravel/framework/src/Illuminate/Console/Command.php(183): Illuminate\Container\Container->call(Array)
  #31 [CUT]/vendor/symfony/console/Command/Command.php(252): Illuminate\Console\Command->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\OutputStyle))
  #32 [CUT]/vendor/laravel/framework/src/Illuminate/Console/Command.php(170): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\OutputStyle))
  #33 [CUT]/vendor/symfony/console/Application.php(946): Illuminate\Console\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
  #34 [CUT]/vendor/symfony/console/Application.php(248): Symfony\Component\Console\Application->doRunCommand(Object(App\Console\Commands\CalculateSentimentCommand), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
  #35 [CUT]/vendor/symfony/console/Application.php(148): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
  #36 [CUT]/vendor/laravel/framework/src/Illuminate/Console/Application.php(88): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
  #37 [CUT]/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php(121): Illuminate\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
  #38 [CUT]/artisan(37): Illuminate\Foundation\Console\Kernel->handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
  #39 {main}"
    ["type"]=>
    string(6) "client"
    ["code"]=>
    string(22) "SerializationException"
    ["requestId"]=>
    string(36) "[CUT]"
    ["statusCode"]=>
    int(400)
    ["result"]=>
    NULL
    ["request"]=>
    array(5) {
      ["instance"]=>
      string(32) "[CUT]"
      ["method"]=>
      string(4) "POST"
      ["headers"]=>
      array(9) {
        ["X-Amz-Security-Token"]=>
        string(7) "[TOKEN]"
        ["Host"]=>
        array(1) {
          [0]=>
          string(34) "comprehend.eu-west-1.amazonaws.com"
        }
        ["X-Amz-Target"]=>
        array(1) {
          [0]=>
          string(40) "Comprehend_20171127.BatchDetectSentiment"
        }
        ["Content-Type"]=>
        array(1) {
          [0]=>
          string(26) "application/x-amz-json-1.1"
        }
        ["User-Agent"]=>
        array(1) {
          [0]=>
          string(19) "aws-sdk-php/3.52.23"
        }
        ["aws-sdk-invocation-id"]=>
        array(1) {
          [0]=>
          string(32) "[CUT]"
        }
        ["aws-sdk-retry"]=>
        array(1) {
          [0]=>
          string(3) "0/0"
        }
        ["X-Amz-Date"]=>
        array(1) {
          [0]=>
          string(16) "20180315T053842Z"
        }
        ["Authorization"]=>
        array(1) {
          [0]=>
          string(247) "AWS4-HMAC-SHA256 Credential=[KEY]/20180315/eu-west-1/comprehend/aws4_request, SignedHeaders=aws-sdk-invocation-id;aws-sdk-retry;host;x-amz-date;x-amz-target, Signature=[SIGNATURE]
        }
      }
      ["body"]=>
      string(234) "{"LanguageCode":"en","TextList":{"5":"Speed limits have nothing to do with  tiredness","6":"False They generally happen because of draconian speed limits","7":"False some drivers start driving tired","8":"They forget to take breaks"}}"
      ["scheme"]=>
      string(5) "https"
    }
    ["response"]=>
    array(4) {
      ["instance"]=>
      string(32) "[CUT]"
      ["statusCode"]=>
      int(400)
      ["headers"]=>
      array(6) {
        ["X-Amz-Security-Token"]=>
        string(7) "[TOKEN]"
        ["Date"]=>
        array(1) {
          [0]=>
          string(29) "Thu, 15 Mar 2018 05:38:42 GMT"
        }
        ["Content-Type"]=>
        array(1) {
          [0]=>
          string(26) "application/x-amz-json-1.1"
        }
        ["Content-Length"]=>
        array(1) {
          [0]=>
          string(2) "99"
        }
        ["Connection"]=>
        array(1) {
          [0]=>
          string(10) "keep-alive"
        }
        ["x-amzn-RequestId"]=>
        array(1) {
          [0]=>
          string(36) "[CUT]"
        }
      }
      ["body"]=>
      string(99) "{"__type":"SerializationException","Message":"Start of structure or map found where not expected."}"
    }
  }

  Inclusive step time: 0.3991219997406

<- Leaving step sign, name 'retry'
----------------------------------

  no changes
  Inclusive step time: 0.39935398101807

<- Leaving step sign, name 'invocation-id'
------------------------------------------

  no changes
  Inclusive step time: 0.39951109886169

<- Leaving step build, name ''
------------------------------

  no changes
  Inclusive step time: 0.3996479511261

<- Leaving step build, name 'builder'
-------------------------------------

  no changes
  Inclusive step time: 0.39979195594788

<- Leaving step validate, name 'validation'
-------------------------------------------

  no changes
  Inclusive step time: 0.39995503425598

<- Leaving step init, name 'idempotency_auto_fill'
--------------------------------------------------

  no changes
  Inclusive step time: 0.40009808540344

In WrappedHttpHandler.php line 191:

  Error executing "BatchDetectSentiment" on "https://comprehend.eu-west-1.amazonaws.com"; AWS HTTP error: Client error: `POST https://comprehend.eu-west-1.amazonaws.com` res  
  ulted in a `400 Bad Request` response:                                                                                                                                       
  {"__type":"SerializationException","Message":"Start of structure or map found where not expected."}                                                                          
   SerializationException (client): Start of structure or map found where not expected. - {"__type":"SerializationException","Message":"Start of structure or map found where  
   not expected."}                                                                                                                                                             

In RequestException.php line 113:

  Client error: `POST https://comprehend.eu-west-1.amazonaws.com` resulted in a `400 Bad Request` response:  
  {"__type":"SerializationException","Message":"Start of structure or map found where not expected."}        
r3oath commented 6 years ago

Upon further testing, it seems the issue may be stemming from the fact that each batch detect in my application is performing its job on an array of chunked comments. Hence, as seen in the example above some arrays aren't indexed starting with zero.

Perhaps this detail should be noted in the documentation or array_values called on the SDK side to ensure the indexing matches what the API is expecting.

kstich commented 6 years ago

The use of array_values should be left up to the implementer in these cases. If we have this by default in the SDK, it means we'll be changing the structure of the data that you generated which may lead to other, more hidden, issues down the line.

The documentation also shows that this is an unindexed array.

$result = $client->batchDetectSentiment([
    // Key               Value
    'LanguageCode' => '<string>', // REQUIRED
    'TextList' => ['<string>', ...], // No keys
]);

// Shows sub-elements with indexes are represented differently.
...
[
    // Key      Value
    'Index' => <integer>,
    'Sentiment' => 'POSITIVE|NEGATIVE|NEUTRAL|MIXED',
    'SentimentScore' => [
        // Key     Value
        'Mixed' => <float>,
        'Negative' => <float>,
        'Neutral' => <float>,
        'Positive' => <float>,
    ],
],
...
r3oath commented 6 years ago

Every array declared in PHP is associative under the hood, eg: ["foo", "bar"] = [0 => "foo", 1 => "bar"].

So if the API intrinsically requires sequential zero-first arrays, it either needs to make that apparent in the documentation, improve the error messaging, or from an SDK perspective call array_values. If the order of strings is critical, then even an exception or warning from the SDK that the array passed isn't sequential and zero-based will go a long way with helping your end users debug bizarre situations like the one I faced.

And the end of the day, the fact that you can pass an array with a non-zero index and cause a SerializationException on the AWS side speaks to this issue requiring a little bit of attention.

KonstantinKolodnitsky commented 3 days ago

You saved my day! I've been battling this problem for days. Applying array_values fixed the problem. Thank you a lot!