PublicMapping / districtbuilder

DistrictBuilder is web-based, open source software for collaborative redistricting.
https://districtbuilder.org
Apache License 2.0
64 stars 8 forks source link

Explore options to improve application startup time #1117

Closed maurizi closed 2 years ago

maurizi commented 2 years ago

My testing for #960 revealed that the deserialize operation itself is not a significant cause of our slow startup time loading TopoJSON layers - suggesting that even in production, the time is predominately spent loading data from S3.

We should explore options for loading the assets from disk instead as a way to improve startup times (caching, a centralized EBS volume, storing the assets in the docker container?) as well as try to estimate how much time savings we would expect to see.

BryanQuigley commented 2 years ago

Speed was pretty reasonable from S3 but did vary a lot from 5 MiB to 20 at different times.

I'm thinking three things could explain/speed it up:

maurizi commented 2 years ago

We actually don't load input.geojson on the server, or anything in the tile directory.

The files we're primarily concerned with for server load are the .buf files actually - topo.buf being the primary & very large one, plus a dozen or so much smaller .buf files.

ddohler commented 2 years ago

Some other factors that I think are likely to come into play are instance networking speed and potentially EBS volume networking speed. It looks like the instances we use have burstable IO for both EBS and networking (that's what "up to" means in this table): https://aws.amazon.com/ec2/instance-types/r5/ , and there are also different speeds for EBS and Network bandwidth.

Benchmarking results will be interesting to see because if the app is bandwidth limited then we might be accidentally paying the networking costs twice if we're loading S3->EBS before loading into memory, but on the other hand if the limiting factor is how much bandwidth S3 is willing to push out to a single client (20MiB/sec doesn't seem all that close to 10Gbps or even 4.75Gbps) then maybe we could gain some speed by downloading in multiple threads.

maurizi commented 2 years ago

I did some investigative work as part of #1128, which revealed that downloading from S3 is not merely a large portion of application startup time - it is nearly all of it.

For even our largest region TX, loading the file from disk and deserializing it took only 12 seconds (!!!).

We should definitely implement file caching.

I'm thinking we want something like the following:

If we switch back to Fargate at some point in the future (which we should consider after implementing the various performance improvements we have in place) we'd have to replace EBS w/ EFS, but conceptually everything would remain the same.

maurizi commented 2 years ago

As Derek pointed out, my tests were done locally using a nice NVMe SSD, so we maybe can't expect 12 second load times out of EBS: https://github.com/PublicMapping/districtbuilder/pull/1128#discussion_r803746281

It's worth benchmarking EBS performance to see what we can expect.

Another option to consider is using local NVMe storage, and pre-loading the data into the AMI, which won't give us instant load times (the docs say it can take up to 5 minutes to copy the AMI image to the machine), but would still likely be substantially faster than loading from S3, and could perhaps offer better performance in a cache-miss scenario than EBS.

ddohler commented 2 years ago

I'm hopeful that an EBS volume will be fast enough -- my recollection is that instances with local instance storage don't come with batteries included, so you have to mount the drives and format them yourself, which would add a lot of extra moving parts to AMI creation. But they are very fast once you get them up and connected to everything else.

maurizi commented 2 years ago

Knowing the performance of using EBS for our data caching will help us decide how to move forward on reducing application startup times.

Closing this, we'll re-assess load times after implementing #1138 (which will end up using instance storage, which is EBS-backed).