Open anthonyfok opened 3 years ago
[Edited] See https://github.com/OpenDRR/opendrr-api/pull/88#issuecomment-828615173 for a more complete benchmark (March 19 vs April 27)
Benchmark (in progress, to be edited)
Before:
Duration | Command |
---|---|
2s | git clone https://github.com/OpenDRR/model-factory.git --depth 1 |
4m58s | [Download] git clone https://github.com/OpenDRR/boundaries.git --depth 1 |
3m08 | [Import] ogr2ogr run on the 9 .gpkg files from git clone of OpenDRR/boundaries |
... | ... |
After:
Duration | Command |
---|---|
2s | git clone https://github.com/OpenDRR/model-factory.git --depth 1 |
43s to 1m20s | wget https://opendrr.eccp.ca/file/OpenDRR/opendrr-boundaries.dump |
... | ... |
Goals include:
Reduce download time, build time, disk usage...
Increase robustness / resilience (e.g. recovering from interrupted download)
... (to be continued)
[x] add_data: Preliminary reorganization (PR #68)
jq
to simply JSON parsing (see https://cameronnokes.com/blog/working-with-json-in-bash-using-jq/)apt-get install -y jq
to python/Dockefile[ ] Make download more fault tolerant and maybe faster
xargs -P
? And/or use pre-generated tarball to group hundreds of CSV files in one go?[ ] add_data.sh - flexible data loading (OpenDRR/model-factory#53)
[ ] Delay pygeoapi (or even Elasticsearch and Kibana) start (Issue #93)
[x] Shellcheck (PR #89)
[ ] Move repetitive calls into functions (second round)
[ ] Benchmark and profiling
Future tasks (that have yet to be turned into GitHub issues):
/usr/bin/time -v
for profilingdocker-compose logs -f -t
provides log with timestamp-a
or--echo-all
optional unless in DEBUG mode for a more concise log.Maybe in Round 2 of refactoring? Or this round? Need to discuss with Drew first:
Random ideas, questions, etc.
-append
,-update
, or-overwrite
fsync=off
,synchronous_commit=off
andfull_page_writes=off
instead, see #77