Open ryanpitts opened 6 years ago
Hey Ryan, here are a couple thoughts:
1 & 2. AHHHHH I didn't realize that was a request I could make.
noting that Heroku has increased the app's boot timeout to 120 seconds for now
Okay! So, to some extent, having waited a little bit has made this all easier.
Here are some observations about etherpad-lite (and we can get into the heroku wrapper as well).
etherpad-lite
uses npm in a slightly non-standard way. The root of the etherpad-lite
repo is used as the primary working space, and the place where one can access the bin
commands and all of that stuff. However, for npm
's purposes the root of the project is actually the src
directory. That's the location that etherpad-lite
sticks its node_modules
directory and package.json
, and then sym-links node_modules
from the src
directory into the repo root.
Why does this matter? Well, when it comes to mismatches between local dev & deployment to Heroku, I had initially assumed that restoring to a fresh / clean startup just required clearing out the node_modules
directory in the project root. NOPE. gotta kill it in the src
directory.
The heroku wrapper is pegged to the 0.10.x
series of node (and the accompanying version of npm
). That's probably been fine (and i'm impressed with how easy it is to get etherpad-lite
up and running across node versions), but the 0.10.x
versions of node are now quite out of date and out of LTS.
Additionally, the 5.x series for npm
now caches the results of dependency checking & resolution in a package-lock.json
file which substantially improves npm
install speed.
A number of changes have been made to a lot of different npm packages since node 0.10.x
, and in particular a couple different security changes, as some of nodes core classes were deemed to be potential vectors for exploits. As a consequence there's been a lot of stuff to upgrade.
Mostly because of npm
dependency checking and resolving what to install from a cold-boot.
Well, mostly we should try and make npm
do less work.
My immediate thought was to try to upgrade to npm
5.x and commit a package-lock.json
file, and see if that helps.
So this is where waiting things out a little helped. The main etherpad-lite repository just cut a new release 4 days ago: https://github.com/ether/etherpad-lite/commit/32027134cbe4e37ced89091bf05e9fd07980ca12
I've merged that up into this repository, added the package-lock.json and pushed it up to here and to staging.
Yeah. The other part of deployment was updating the heroku wrapper. The heroku wrapper is the thing that dictates what node version to run, and consequently what version of npm
is being run by default. (you can run npm
5.x with older versions, but like, what the heck, lets try and upgrade all the way)
It's entirely unclear to me what the heck the dependencies listed in the heroku wrapper's package.json are about, and i deleted them to seemingly no effect. I did that and bumped up the node version to 8.x (which gets npm 5.x by default).
The one snafu is that the database driver that etherpad-lite relies upon, ueberdb
hasn't cut a release for the aforementioned security changes in node. Under the hood node changed the way their Buffer
class works, and that screws up ueberdb
's ability to connect to postgres databases in their release version.
This is particularly peculiar, because they've merged changes to handle this into their master
branch. They just haven't cut a release for a number of months.
Pegging this etherpad-lite
repository to ueberdb
's master branch fixes the issue (and is currently deployed staging).
Yeah, mostly, i think! The app starts up fast on staging, and skips most of the dependency checking.
Well, first things first, we should probably peg to a specific commit sha for ueberdb
. I need to check/read up on what the npm syntax for that is.
Mostly we're going to need to keep on top of further changes going forward. It is vastly preferable not to have to peg to a github repository for ueberdb
. Additionally when further changes come down the pike, it'll be important to update the package-lock.json and push that out to the app.
Checking in with @ryanpitts to see if we can close this ticket! :)
ooh yes, we should close it, but we should also migrate the awesome notes you wrote up somewhere
Looks like there's some action towards getting a package released on the ueberdb side of things too! https://github.com/Pita/ueberDB/issues/101
btw, a new version of ueberDB2 was released a few months ago, so we should bump off of the github repo to the NPM version: https://www.npmjs.com/package/ueberdb2
sweet! I'm going to make a calendar reminder to follow up here after SRCCON though :)
noting that we're pegged to the latest ueberdb2 now https://github.com/OpenNews/etherpad-lite/commit/645d3569a5e9c4759d7c216f3baee7cf30c49c96
I think this issue could be closed now? @knowtheory what do you think?
Close!
Last recommendation i have now is that ueberdb2 has finally started cutting releases again, so we can stop pegging to their shas, and just point to the main released version (which appears to be 0.4.0
), which the main etherpad-lite repo points to as well.
I would specify that version, give it a push to the staging server to test 4realz and then if that works push it to prod.
yep, we're pointing to ueberdb2 0.4.0 as well https://github.com/OpenNews/etherpad-lite/blob/master/src/package.json#L58
TL;DR: On Heroku's daily restart, this app sometimes crashes because it doesn't start up quickly enough.
Quick background on how OpenNews etherpad works
Overview: We run an
etherpad-lite
instance on Heroku, which is publicly available at https://etherpad.opennews.org. The instance uses a Postgres db and a Standard 1X dyno (512MB RAM, 1x CPU share). In testing viaetherpad-load-test
, these resources are more than enough to handle our normal traffic.Deployment details: We use SSL for our etherpad-lite instance, which means that to run on Heroku, we need our own forked version of
etherpad-lite
that includesheroku-ssl-redirect
. That means we also use a forked version ofetherpad-lite-heroku
, which pulls in our version of the etherpad software as a submodule.The
etherpad-lite-heroku
wrapper is what actually gets deployed to Heroku, where it runs a launch script that does some config and starts the etherpad service.The reboot problem
Heroku restarts your app dynos once a day for maintenance, which is normally a fine thing. However,
etherpad-lite
occasionally takes too long to start back up, resulting in:During the
etherpad-lite
startup process, it checks in with npm on a whole list of dependencies. A number of them appear to be outdated, which I think might be the root of the problem here. The software seems to work fine once it's actually running, but sometimes the reboot itself takes long enough that Heroku throws a timeout and the app crashes again.The short-term fix is manually restarting our Heroku instance—usually this only requires 1 or 2 restarts, but occasionally takes 10 or so. The most problematic times are mid-morning and midday (when there's more overall web traffic, which is what makes me think those dependency checks are the problem). The long-term fix, of course, involves updating the
etherpad-lite
software.Some logs
I've been able to fork and modify these etherpad apps, get them running on Heroku, and do a certain amount of troubleshooting, but I'm about at my limit of feeling comfortable ripping into node software. Here are a few logs that hopefully tell some tales: