cpan-testers / cpantesters-project

A meta-project for tracking CPAN Testers project goals
6 stars 1 forks source link

503 Errors from CPANTesters website #10

Open preaction opened 8 years ago

preaction commented 8 years ago

When we added Fastly to get caching and improve performance, we started getting intermittent 503 backend read error messages. We need to make sure the Fastly caching is working, figure out why the 503 errors are happening, and try to reduce them.

preaction commented 8 years ago

On the Fastly website (http://fastly.com), I've updated the timeouts for "First bytes" to 60,000 ms (60s). This should drastically reduce the amount of 503s we get. We should check back with people in a few days to see if they've gotten any more 503s.

ghost commented 8 years ago

@preaction This is still an issue. Just got the following:

Error 503 backend read error

backend read error
Guru Mediation:

Details: cache-dfw1823-DFW 1466788517 1422049808
preaction commented 7 years ago

@wchristian just reported this again as well. This time it seems that the view-report.cgi was being hammered (as usual). @mst has volunteered to try something interesting, but otherwise has also suggested Plack::Handler::WrapCGI with its execute flag to do a pre-forking CGI script that saves us from having to load a bunch of modules every time. If we restrict it to just the view-report.cgi for now, we can see how this helps us before deciding if we need to roll it out to more of the CGI scripts.

During the present errors I also noticed that we were spending 40-90% CPU time in iowait. I checked iostat and it seems like there's a lot of writing to disk going on. I probably need to track down where the writes are coming from and see if any can be reduced. We've got a lot of free memory, which concerns me that it isn't being used for disk i/o caching. I know we've got a lot of various log files being written, which it might be possible to reduce using syslog. But I also enabled the data release (#9) which will certainly increase the amount of disk we're writing at various times. It might be necessary to move the website to another machine, separate from the database and backend processes...