apache / openwhisk

Apache OpenWhisk is an open source serverless cloud platform
https://openwhisk.apache.org/
Apache License 2.0
6.42k stars 1.16k forks source link

Webaction runs twice within one activation, and reported as failure when it did not actually fail #5481

Open mretallack opened 2 months ago

mretallack commented 2 months ago

Environment details:

Steps to reproduce the issue:

  1. Have an incoming HTTP POST from AWS IoT (using HTTP as a Action Rule)
  2. Have a webaction (visible on internet), using python:3.9 action runner
  3. Monitor the failures of the web action.

Provide the expected results and outputs:

Expected output

2024-05-02 08:07:27 aa9313bdf52a490b9313bdf52ac90b6b python:3 warm  170ms      success         dataspace/iot_action:0.0.76:

    "logs": [
        "2024-05-02T08:07:27.203249047Z stdout: Message type is None",
        "2024-05-02T08:07:27.203353817Z stdout: Authenticated with deployment test",
        "2024-05-02T08:07:27.203359561Z stdout: Processing IoT message iot/statusResponse",
        "2024-05-02T08:07:27.203362535Z stdout: Message: {\"location\": {\"latitude\": \"x\", \"locationDescription\": \"\", \"longitude\": \"x\"}, \"tenantName\": \"LoadTest1\"}",
        "2024-05-02T08:07:27.203366872Z stdout: Publishing statusResponse to topic example.fct.iot_test_statusresponse.1",
        "2024-05-02T08:07:27.203369844Z stdout: New keys ['\"1454419940501\"']",
        "2024-05-02T08:07:27.203372379Z stdout: Finished processing",
        "2024-05-02T08:07:27.20337515Z  stdout: returning: {\"headers\": {\"Content-Type\": \"application/json\", \"Connection\": \"close\"}, \"statusCode\": 200, \"body\": \"\\\"OK\\\"\"}"
    ],

Provide the actual results and outputs:

    "logs": [
        "2024-05-02T08:03:35.441164892Z stdout: Message type is None",
        "2024-05-02T08:03:35.441233404Z stdout: Authenticated with deployment test",
        "2024-05-02T08:03:35.44123762Z  stdout: Processing IoT message iot/statusResponse",
        "2024-05-02T08:03:35.441241006Z stdout: Message: {\"location\": {\"latitude\": \"X\", \"locationDescription\": \"Engineering Lab\", \"longitude\": \"X\"}, \"tenantName\": \"LoadTest1\"}",
        "2024-05-02T08:03:35.441244653Z stdout: Publishing statusResponse to topic topic example.fct.iot_test_statusresponse.1",
        "2024-05-02T08:03:35.441247605Z stdout: New keys ['\"1454419850988\"']",
        "2024-05-02T08:03:35.441250359Z stdout: Finished processing",
        "2024-05-02T08:03:35.441253345Z stdout: returning: {\"headers\": {\"Content-Type\": \"application/json\", \"Connection\": \"close\"}, \"statusCode\": 200, \"body\": \"\\\"OK\\\"\"}",
        "2024-05-02T08:05:10.601380606Z stdout: Message type is None",
        "2024-05-02T08:05:10.601398036Z stdout: Authenticated with deployment test",
        "2024-05-02T08:05:10.601401152Z stdout: Processing Stratos message iot/status",
        "2024-05-02T08:05:10.601404588Z stdout: Message: {\"commonName\": \"1454419850988\", .......,
        "2024-05-02T08:05:10.6014228Z   stdout: Publishing status to topic example.fct.iot_test_status.1",
        "2024-05-02T08:05:10.601426211Z stdout: New keys ['\"1454419850988\"']",
        "2024-05-02T08:05:10.601428478Z stdout: Finished processing",
        "2024-05-02T08:05:10.601431725Z stdout: returning: {\"headers\": {\"Content-Type\": \"application/json\", \"Connection\": \"close\"}, \"statusCode\": 200, \"body\": \"\\\"OK\\\"\"}",
        "2024-05-02T08:05:10.604225532Z stderr: The action did not initialize or run as expected. Log data might be missing."
    ],

Additional information you deem important:

It appears that the same activation is capturing the output from two runs. And therefore the second (or first) run is assumed to of failed.

I have tried adding a sleep into the while loop (thinking it was a timeing issue), but still does it·

https://github.com/apache/openwhisk-runtime-python/blob/1389697e3251ef08a930ed95629379551a8d3c9a/core/python3Action/lib/launcher.py#L54

rabbah commented 2 months ago

You might see this if the webaction returned a 50x status code - nginx may be automatically retrying the request and hence running the action again. You can confirm this by checking the nginx logs for the action in question.

mretallack commented 2 months ago

Thanks for the suggestion,

I will retry it a few times over the next day, but this is what I see on the access log in nginx:

10.0.0.65 - - [02/May/2024:10:31:17 +0000] [#tid_8daa7a35d7e1b39c3114d4c94d99ae12] POST /api/v1/web/dataspace/iot_package/iot_action HTTP/1.1 200 4 - AHC/2.1 172.20.41.68:8080
10.0.1.70 - - [02/May/2024:10:31:18 +0000] [#tid_63cd895864243ab89c8ed39fec47d83f] POST /api/v1/web/dataspace/iot_package/iot_action HTTP/1.1 200 4 - AHC/2.1 172.20.41.68:8080

From reading this it looks like both are returning 200.

I am wondering if AWS do something strange with there HTTP client and send multiple requests over the same connection and dont wait for a reply (only a theory)

I will keep looking at the logs until somthing else comes up :)