bakdata / aws-lambda-r-runtime

Serverless execution of R code on AWS Lambda
https://medium.com/bakdata/running-r-on-aws-lambda-9d40643551a6
MIT License
143 stars 52 forks source link

Event source length limited by the way it is passed to the runtime #29

Open ed-sparkes opened 5 years ago

ed-sparkes commented 5 years ago

There seems to be some hard limit on the event source that this runtime will process.

I am providing a large event source, a json stucture with one field containing an array of 20,000 observations (each observation has 3 fields).

I am receiving the following runtime error

START RequestId: da723778-9f4e-4198-a889-567b0ca8d790 Version: $LATEST /opt/bootstrap: line 13: /opt/R/bin/Rscript: Argument list too long END RequestId: da723778-9f4e-4198-a889-567b0ca8d790 REPORT RequestId: da723778-9f4e-4198-a889-567b0ca8d790 Duration: 112.00 ms Billed Duration: 200 ms Memory Size: 1028 MB Max Memory Used: 285 MB RequestId: da723778-9f4e-4198-a889-567b0ca8d790 Error: Runtime exited with error: exit status 1 Runtime.ExitError

Event source json data attached as txt (would not allow me to upload json file)

The R function is not particularly relevant as its not getting that far, but included below anyway. Its dependent on PlackettLuce package which i have loaded in a separate layer. But this should be reproducable with a very basic Lambda function as it appears that the error is an the runtime invocation before the handler is event called.

library(PlackettLuce)

handler <- function(cjs) {
    #NOW USE FUNCTIONS FROM THE PLACKETT-LUCE PACKAGE TO DO THE WORK
    cjranks=rankings(cjs,id=1,item=2,rank=3)#put into bespoke format for PlackettLuce package
    con=connectivity(cjranks)#check connectivity of design (note that any individual object winning/losing all comparisons may be deemd an issue when it isn't really)
    mod <- PlackettLuce(cjranks)#fit Plackett-Luce model

    #more detailed connectivity (how many other objects in each objects cluster)
    connect.oth=sapply(1:length(con$membership),function(i) con$csize[con$membership[i]]-1)

    #READ THE COEFFICIENTS AND STANDARD ERRORS
    estmeas=coef(mod,ref=NULL)
    estse=sqrt(diag(vcov(mod, ref=NULL)))
    measures.data.frame=data.frame(id=names(estmeas)
        ,measure=estmeas
        ,se=estse
        ,connectivity.no=con$no
        ,connect.oth=connect.oth)

    #VIEW THE OUTPUT
    measures.data.frame

}

testdata.txt

This looks releavant (https://stackoverflow.com/questions/24658534/usr-bin-rscript-argument-list-too-long)

DaveParr commented 5 years ago

I also experienced this. FWIW the way I got around it is to specify the json passed me the data 'field-wise' i.e. in 3 fields of 20,000 in your case, and not 20,000 records of 3 fields, though I appreciate you may not have control of the upstream data structure.

philipp94831 commented 5 years ago

Hi @ed-sparkes, I think this issue is resolved in the most recent version of the runtime. Can you check that? I was able to reproduce the error and fixed it for my case.

DaveParr commented 5 years ago

I've started coming across this the other way now that I am using SAM CLI. sam local invoke will error like so:

2019-08-06 10:47:50 arn:aws:lambda:eu-west-2:131329294410:layer:r-runtime-3_6_0:12 is already cached. Skipping download
2019-08-06 10:47:50 Requested to skip pulling images ...

2019-08-06 10:47:50 Mounting /[myfunction] as /var/task:ro inside runtime container
standard_init_linux.go:211: exec user process caused "argument list too long"

I believe arn:aws:lambda:eu-west-2:131329294410:layer:r-runtime-3_6_0:12 is the correct layer?

I did some more digging and found the following: I was passing 2 very long json arrays, representing a column of data each. These were 4319 rows each, effectively representing a 4319*2 dataframe as the body argument of a mocked apigateway call.

It looks like this, except with longer date_time and value vectors:

{
  "body": {
    "date_time": [
      "2018-07-03 00:00:00",
      "2018-07-03 00:30:00",
      "2018-07-03 01:00:00",
      "2018-07-03 01:30:00",
      "2018-07-03 02:00:00"
    ],
    "value": [
      83.455,
      82.075,
      89.96,
      74.585,
      64.43
    ]
  },
  "resource": "/{proxy+}",
  "path": "/path/to/resource",
  "httpMethod": "POST",
  "isBase64Encoded": true,
  "queryStringParameters": {
    "foo": "bar"
  },
  "pathParameters": {
    "proxy": "/path/to/resource"
  },
  "stageVariables": {
    "baz": "qux"
  },
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Encoding": "gzip, deflate, sdch",
    "Accept-Language": "en-US,en;q=0.8",
    "Cache-Control": "max-age=0",
    "CloudFront-Forwarded-Proto": "https",
    "CloudFront-Is-Desktop-Viewer": "true",
    "CloudFront-Is-Mobile-Viewer": "false",
    "CloudFront-Is-SmartTV-Viewer": "false",
    "CloudFront-Is-Tablet-Viewer": "false",
    "CloudFront-Viewer-Country": "US",
    "Host": "1234567890.execute-api.eu-west-2.amazonaws.com",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Custom User Agent String",
    "Via": "1.1 08f323deadbeefa7af34d5feb414ce27.cloudfront.net (CloudFront)",
    "X-Amz-Cf-Id": "cDehVQoZnx43VYQb9j2-nvCh-9z396Uhbp027Y2JvkCPNLmGJHqlaA==",
    "X-Forwarded-For": "127.0.0.1, 127.0.0.2",
    "X-Forwarded-Port": "443",
    "X-Forwarded-Proto": "https"
  },
  "requestContext": {
    "accountId": "123456789012",
    "resourceId": "123456",
    "stage": "prod",
    "requestId": "c6af9ac6-7b61-11e6-9a41-93e8deadbeef",
    "requestTime": "09/Apr/2015:12:34:56 +0000",
    "requestTimeEpoch": 1428582896000,
    "identity": {
      "cognitoIdentityPoolId": null,
      "accountId": null,
      "cognitoIdentityId": null,
      "caller": null,
      "accessKey": null,
      "sourceIp": "127.0.0.1",
      "cognitoAuthenticationType": null,
      "cognitoAuthenticationProvider": null,
      "userArn": null,
      "userAgent": "Custom User Agent String",
      "user": null
    },
    "path": "/prod/path/to/resource",
    "resourcePath": "/{proxy+}",
    "httpMethod": "POST",
    "apiId": "1234567890",
    "protocol": "HTTP/1.1"
  }
}

The above will run the following lambda function:

lambda_handler <- function(body,...) {

  str(body)

  return(body)
}

If I removed 1 column from the longer version, then the error didn't appear, and the function correctly ran. If I removed one third of each column, i.e. 1439 records from each column, the execution passed locally.

If I deploy the code, and then run it on the aws infrastructure the code ran fine in all cases.

It looks potentially like what I've hit is in fact upstream of your layer? And to do with the docker image of the amazon ami?