darklang / dark

Darklang main repo, including language, backend, and infra
https://darklang.com
Other
1.68k stars 91 forks source link

Switch BwdServer to Cloud Run #4663

Closed pbiggar closed 10 months ago

pbiggar commented 1 year ago

optimizations:

pbiggar commented 1 year ago

Performance testing

Request to BwdServer, which is an ASP.NET/F#/dotnet 6 server. Requests fetch a "hello world" program from the DB, then execute it in an interpreter, saving metrics about the execution in the DB as JSON.

"response" is the http response which includes times up to "execute_handler". "execute_handler" is our intepreter. "custom_domain", "getTLID", "get oplists cache", "get oplist" and "get secrets" are SQL selects. "pusher" is a http call to pusher.com. "traceResultHook" are SQL updates

Cloud Run deployment: (avg latency 1800ms)

K8s deployment: (avg latency 500ms)

pbiggar commented 1 year ago

Testing:

pbiggar commented 1 year ago

Pushed CPU to 4:

response 11/17/13/16 custom domain: 2/4/2/2 getTLID: 2/2/1/2 get oplist cache: 2/3/1/2 get oplist: 1/2/1/2 get secrets: 1/3/2/2 execute_handler: 3/3/4/4 traceResultHook: 3/4/5/4 traceResultHook: 5/6/8/25 pusher: 20/21/19/21

pbiggar commented 1 year ago

I speculate that the extra second of latency is due to not using a Global Load Balancer, and will probably be fixed once we do that. Unfortunately that exposes it to the world so will need to solve either the metadata issue or some other way to disable users from running code over here until i solve that.

pbiggar commented 1 year ago

I chatted to a few people about this on twitter, and got some good leads:

So the fact that things should be fast is a good reason to dig in a little bit. Initial thoughts:

pbiggar commented 1 year ago

I've been looking into networking around Cloud Run and GCP in general.

Cloud Run does not run in our VPC, and so does not need to be firewalled off from our resources.

Cloud SQL also does not run in our VPC, but there's work needed to route to it from Cloud Run.

pbiggar commented 1 year ago
pbiggar commented 1 year ago
pbiggar commented 1 year ago

Including some traces here because they're interesting

GKE

Basically every trace on GKE looks like this

Screenshot 2023-01-04 at 9 17 41 AM

These are all different traces running on Cloud Run

Pretty slow everything

Screenshot 2023-01-04 at 9 20 16 AM

Extremely slow execution (I saw numerous traces like this - note that DB isn't great here either)

Screenshot 2023-01-04 at 9 20 28 AM

Multiple exciting slowdowns

Screenshot 2023-01-04 at 9 20 38 AM