Closed Sytten closed 3 years ago
I don't think such service warnings are a fit for this repo. I did not follow this outage very closely, however, serverless platforms need to just work and the entire point is to not worry about what's inside the black box. If you are impacted by what's in the black box, we're likely already working on preventing this from happening again. Hope that helps.
I don't agree with this vision considering the level of detail already contained in the FAQ (an ABI contract running on gVisor is as important a contract as a network architecture) and Cloud Run is not only used by beginners that don't care how the thing runs. But it's your doc, your choice :) It would do GCP good to be a bit more transparent on how stuff is built/run.
I work on Cloud Run for over a year now and I am hearing about RSG for the first time here. I don't think those details were supposed to be shared with customers in the first place, as our container contract is documented as you said in your comment. Problems happening anywhere else is a bug and we conduct work and post-mortems to prevent them from happening again.
I also would like you to remind you that this is a community-maintained repository. So if you are looking for implementation details here, this is not the right place.
We had an outage two weeks ago on cloud run and I learned the existence of RSG. We experienced spikes in latency between our service and our database up to 20-30s. We were able to limit the problem by reducing the concurrency level to our DB connection pool size and scale the service. I think this would be valuable information to the FAQ to demystify slowly the black box. AWS is usually pretty open about the components that build their stuff but I really had to push just to get a root cause of our issue.