aws / apprunner-roadmap

This is the public roadmap for AWS App Runner.
https://aws.amazon.com/apprunner/
Other
301 stars 14 forks source link

🐛 Networking problem on AppRunner #130

Open sitole opened 2 years ago

sitole commented 2 years ago

Hello, few month ago i reported some DNS-resolver issues on App Runner. This issue was already fixed (AWS support told me).

Right now, for something like week we start receiving strange HTTP response codes from our websites and services released on App Runner. Mostly its 503 and 502 error codes. I contacted paid AWS Support but for 5 days without any fix or solution relation.

From start i think it can be problem on our side but then we start receiving HTTP errors from more and more services using App Runner. For example we are using it for Next.js with SSR. Today i discover SSR pages are broken but static sites just forwarded trough Next.js are okay so i think its another networking problem.

Someone with same issues? We are using App Runner in Ireland (eu-west-1)

spesnova commented 2 years ago

@Sitole

Are you using Streaming SSR introduced in React 18?

sitole commented 2 years ago

@spesnova no, we are not using new React 18

spesnova commented 2 years ago

@Sitole

OK, thanks. My first assumption was you are using Streaming SSR which requires HTTP streams. As far as I know, App Runner doesn't support HTTP steams for now, then that could be a cause. But that's not your case this time.

rsharrott commented 1 year ago

We saw this same problem connecting an AppRunner Instance in the default VPC to MongoDB Atlas. We saw frequent connection resets which caused Mongo DB connections to timeout.

After moving our service off AppRunner to another hosting service, the connection issues disappeared entirely.

amitgupta85 commented 1 year ago

Hi @rsharrott, thank you for reaching out. We are looking into this issue you’ve experienced and would like to ask a few questions to get a better understanding.

Can you clarify that you were using the AppRunner default VPC (public egress), and had not set up a App Runner VPC Connector for this connection between your App Runner service and MongoDB?

If you were using a VPC Connector, were the VPC’s subnets public or private?

Lastly, were you seeing the 5xx response codes when sending requests to your App Runner Service or just when your Service sent requests to your database?

rsharrott commented 1 year ago

Hi @amitgupta85 - we were using the AppRunner default VPC (public egress). We did not set up a VPC Connector.

The issues were in background tasks and manifested as timeouts or simply the nodejs mongodb driver disconnecting and reconnecting. Sometimes it appeared as Buffering, sometimes as ServerSelection errors.

After investigating with MongoDB, they provided the following info: _By reviewing FTDC(Full Time Diagnostic Data Capture), there is no indication of any resource pressure except for an occasional high Tcp RetransSegs showing the total number of TCP segments/s that have retransmitted data.

In general, this indicates packet loss, for example due to network congestion or other network issues._

The inbound clients occasionally terminated the connections with Atlas abruptly so that mongodb was ending the connections with "Connection reset by peer" and they suggested VPC peering.

If you provide a direct email or contact method; I can send some more sensitive details (IP addresses, etc) if needed.

We were not seeing 500 errors inbound to this service, only outbound to the DB.

ThomasFahrner-Amazon commented 1 year ago

Hi @rsharrott, thanks for providing those details. Can we reach out to you and open a line of communication through the email listed in your GitHub profile?

rsharrott commented 1 year ago

@ThomasFahrner-Amazon Sure - feel free to reach out