aws / apprunner-roadmap

This is the public roadmap for AWS App Runner.
https://aws.amazon.com/apprunner/
Other
292 stars 13 forks source link

Configurable request timeout #104

Open suzukieng opened 2 years ago

suzukieng commented 2 years ago

Community Note

Tell us about your request

Currently AppRunner enforces a 30 second timeout on HTTP requests. If the timeout is hit, a HTTP 503 response is returned.

For some use cases, especially large file uploads, this is a severe limitation.

Describe alternatives you've considered

We first refactored our file upload to use S3 multi-part uploads to break the upload request into several shorter requests. But then we ran into the problem that the S3 complete_multipart_upload call can take a long time (several minutes) to complete. We are now considering having the user upload directly to S3, and then notifying the AppRunner service of the completed upload. If that avenue is unsuccessful, we will move to EC2, but that would be a shame as AppRunner has worked well for us so far.

Additional context

AWS support case ID (9433361761)

Attachments

Behavior can be easily triggered by calling something that takes longer than 30 seconds (in this case, a dummy "sleep" route)

> GET /sleep?seconds=31 HTTP/1.1
> Host: xxxx.eu-west-1.awsapprunner.com
> User-Agent: curl/7.64.1
> Accept: */*
> 
< HTTP/1.1 503 Service Unavailable
< content-length: 95
< content-type: text/plain
< date: Mon, 03 Jan 2022 07:39:44 GMT
< server: envoy
haversnail commented 2 years ago

In addition to 👍'ing, it seems that this issue is the root cause of other high-demand issues in this repo — referencing them here in an effort to consolidate:

13

23

86

suzukieng commented 2 years ago

This feature request has been open for over a year now (refering to #13). No official comment, no timeline, nothing. Come on.

yimipeng commented 1 year ago

WIP

amitgupta85 commented 1 year ago

Thanks for your patience. What would be a reasonable maximum request time out value that you would like to get supported in App Runner?

jvisker commented 1 year ago

Thanks for your patience. What would be a reasonable maximum request time out value that you would like to get supported in App Runner?

Ideally it could go as long as an ALB can. However most HTTP calls I work with are 2 minutes or less (usually much shorter).

haversnail commented 1 year ago

Thanks for your patience. What would be a reasonable maximum request time out value that you would like to get supported in App Runner?

Is this something that could still be configurable? Similar to what @jvisker said, ideally this would match the min/max/default timeout of ALB or CloudFront or similar (1/60/30 respectively, IIRC).

For WebSockets and Server-Sent Events, you would also want the connection to remain open as long as data is being sent across that connection — in my first pass using AppRunner and SSE, I noticed that the connection timeout was not being reset even when data was in fact being sent within the timeout period. (Happy to open another issue for that if need be, though I was hoping it would resolve itself once this, #13 and/or #23 are resolved).

houston3 commented 1 year ago

+1 for configurable timeout. Maximum value should be at least 5 minutes

amitgupta85 commented 1 year ago

Can you tell us more about your use case?

houston3 commented 1 year ago

At the moment we use Fargate for legacy applications that can't use lambda due to lambda/apigw limits (timeout/request size etc). We'd like to move them to App Runner. The applications deal with e.g. large file uploads, complex db queries and slow connections.

snnles commented 1 year ago

Hello everyone, Thank you for your feedback and patience on this issue. We have increased the request read timeout in App Runner from 30 seconds to 120 seconds.

We will keep this issue open to continue our conversation as work through increasing the timeout limits and making it configurable for you. Appreciate all your feedback!

Documentation link: https://docs.aws.amazon.com/apprunner/latest/dg/develop.html#develop.considerations

dlo commented 1 year ago

@snnles This is great, but we're still seeing consistent 30s timeouts on our App Runner instances. Is this a slow rollout, or is there something we need to do to explicitly configure this?

amitgupta85 commented 1 year ago

Have you done any deployments since this feature launch? This requires a new deployment on your service after the feature launch.

bnicholl commented 1 year ago

For my use case, I plan on using App runner as the front end of a web app that is loading a rather large CSV file via user input, and continuously sending data in batches to sagemaker for ML inference. So when the SageMaker endpoint completes inference on the given batch, the App runner service collects more data from the CSV, then sends to sagemaker. Because of this, I'd need app runner to reset the request read timeout every time a sagemaker batch completes and the app runner service begins running again. Does app runner currently achieve this? If not, is this applicability in the works?

amitgupta85 commented 1 year ago

App Runner provides vcpu during request processing. Each request has a maximum timeout of 2 minutes - https://docs.aws.amazon.com/apprunner/latest/dg/develop.html If your batch processing can finish within the request timeout, it can be done within a single App Runner request. Otherwise it should be orchestrated as multiple requests to the App Runner service.

patrickwerz commented 1 year ago

Are there any plans to increase this limit or make it configurable?

wvanrensburg commented 1 year ago

Sam here... I have an API that accepts large file uploads with processing time, and this takes over 2 minutes sometimes. Any update here to make it configurable?

nebbles commented 1 year ago

Cross-posting my comment from the Support web sockets issue, as I think it's relevant to the same underlying problem about request timeout.

App Runner currently supports 2 mins of maximum request timeout

Is this the case even with data actively being transmitted? Heroku's approach1 is:

"An application has an initial 30 second window to respond with a single byte back to the client. However, each byte transmitted thereafter (either received from the client or sent by your application) resets a rolling 55 second window. If no data is sent during the 55 second window, the connection will be terminated."

This is great since provided everything is healthy, the client and server can confirm with keepalive pings, keeping the connection open.

Footnotes

  1. devcenter.heroku.com/articles/request-timeout#long-polling-and-streaming-responses

If this could be delivered, it would offer us the most flexibility to build to our use cases whether that be websockets or long processing requests, etc.

bnicholl commented 12 months ago

If this could be implemented in ones code, would a potential work around be to write a while loop with a try except clause that continuously checks to see if the given process is completed? Basically, as long as the python code is running, the server won't time out?

kpconnell commented 11 months ago

Just for reference, Gcloud Run, which is the main competitor here is 60 minutes, Azure Container instances is infinite, and even AWS lambda is 15 minutes.

haversnail commented 10 months ago

Cross-posting my comment from the Support web sockets issue, as I think it's relevant to the same underlying problem about request timeout.

App Runner currently supports 2 mins of maximum request timeout

Is this the case even with data actively being transmitted? Heroku's approach1 is: "An application has an initial 30 second window to respond with a single byte back to the client. However, each byte transmitted thereafter (either received from the client or sent by your application) resets a rolling 55 second window. If no data is sent during the 55 second window, the connection will be terminated." This is great since provided everything is healthy, the client and server can confirm with keepalive pings, keeping the connection open.

Footnotes

  1. devcenter.heroku.com/articles/request-timeout#long-polling-and-streaming-responses

If this could be delivered, it would offer us the most flexibility to build to our use cases whether that be websockets or long processing requests, etc.

Heavy +1. It seems this approach would resolve not only this issue, but would help enable #13 and #189 as well.

sakirsensoy commented 6 months ago

Are there any plans to increase this limit or make it configurable? @snnles @amitgupta85

kpconnell commented 6 months ago

The concept that all HTTP calls are short lived is a bit of an architecture ivory tower. I understand the need to protect sockets on load balancers with unhealthy clients and services but at a minimum, the ability to send along a header for certain known long running processes, or the above approach around having traffic reset the window seems completely reasonable and necessary.

hossameldeen commented 4 months ago

Can you tell us more about your use case?

Running a query that takes even more than the updated timeout, 120 seconds, on a Metabase instance that is deployed on AWS App Runner.

jl4nz commented 4 months ago

Can you tell us more about your use case?

Data import/processing via a front-end system request. Some third party systems do work by having the user to wait based on their input size. When this input is considerable, request do take longer than 2 minutes (waiting for database, backend, etc). For example, a system like this is REDCap.

I also understand we need limits, but could this be as the ALB 4000 sec limit as well?

Aanhane commented 2 weeks ago

Similar issue here; running bulk updates via a UI or file imports that just run over 2 minutes, start failing.

wesmontgomery commented 1 week ago

I agree, the request timeout is way too short. We have some use cases where we need to run a long queries or perform some synchronous tasks in one web request. 900 seconds would be ideal, 300 seconds at minimum.

@backnol-aws @snnles @jsheld @lazarben @scuw19 @amitgupta85 @akshayram-wolverine