Derpseh / Spyglass

Nationstates Sheet-generator for easily finding approximate update times
GNU General Public License v3.0
11 stars 10 forks source link

Replace openpyxl with XlsxWriter #31

Open esfalsa opened 1 year ago

esfalsa commented 1 year ago

This PR replaces openpyxl with XlsxWriter.

Performance

Most benchmarks I can find suggest XlsxWriter should have better performance than openpyxl, at least for writing data. (The benchmarks on openpyxl's own documentation would suggest this, too.)

Here are some quick benchmarks from my local machine (2020 MacBook Air with a M1 processor). The difference for me is just a few seconds, so it's not quite to the level some benchmarks on the internet would suggest, but it includes parsing the daily dump as well (although excludes downloading it).

openpyxl

$ for i in {1..5}; do time python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump; done      
Saving spreadsheet...
python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump  18.86s user 0.63s system 96% cpu 20.231 total
Saving spreadsheet...
python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump  19.29s user 0.43s system 98% cpu 20.087 total
Saving spreadsheet...
python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump  18.99s user 0.96s system 98% cpu 20.201 total
Saving spreadsheet...
python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump  19.02s user 0.60s system 99% cpu 19.678 total
Saving spreadsheet...
python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump  18.89s user 0.76s system 98% cpu 20.018 total

XlsxWriter

$ for i in {1..5}; do time python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump; done
Saving spreadsheet...
python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump  14.06s user 0.56s system 83% cpu 17.505 total
Saving spreadsheet...
python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump  14.18s user 0.53s system 86% cpu 16.935 total
Saving spreadsheet...
python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump  14.10s user 0.55s system 85% cpu 17.221 total
Saving spreadsheet...
python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump  13.95s user 0.43s system 86% cpu 16.557 total
Saving spreadsheet...
python spyglass.py -n esfalsa --minor 3550 --major 5350 --dump  14.03s user 0.48s system 87% cpu 16.656 total

XlsxWriter apparently uses less CPU as well, but I didn't really find that advertised as a benefit much from my internet searches so it might have just been a fluke with whatever else was running on my computer at the time.

File Size

Incidentally, XlsxWriter seems to produce a smaller output file as well, at least for today's daily dump.

openpyxl

$ stat -f '%z' spyglass2023-4-1.xlsx                            
7520100

XlsxWriter

$ stat -f '%z' spyglass2023-4-1.xlsx
6482421
AavHRF commented 1 year ago

My test runs: System: M1 Macbook Pro (2020, 8 core CPU/8 core GPU/16 core Neural Engine) / 8GB RAM / 256GB SSD, MacOS Ventura 13.1

xlsxwriter ```zsh $ for i in {1..5}; do time python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump; done Saving spreadsheet... python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump 7.42s user 0.20s system 87% cpu 8.699 total Saving spreadsheet... python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump 7.41s user 0.15s system 90% cpu 8.365 total Saving spreadsheet... python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump 7.43s user 0.15s system 90% cpu 8.367 total Saving spreadsheet... python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump 7.36s user 0.17s system 89% cpu 8.404 total Saving spreadsheet... python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump 7.46s user 0.15s system 88% cpu 8.571 total ```
openpyxl ```zsh $ for i in {1..5}; do time python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump; done Saving spreadsheet... python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump 8.96s user 0.23s system 91% cpu 10.046 total Saving spreadsheet... python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump 8.96s user 0.22s system 92% cpu 9.949 total Saving spreadsheet... python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump 8.97s user 0.22s system 92% cpu 9.973 total Saving spreadsheet... python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump 8.89s user 0.24s system 91% cpu 10.037 total Saving spreadsheet... python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump 8.97s user 0.19s system 89% cpu 10.211 total ```

System: 4 AMD EPYC 7551 cores / 24GB RAM / 40GB drive Ubuntu 20.04.6 LTS aarch64

xlsxwriter ```bash $ for i in {1..5}; do time python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump; done Saving spreadsheet... real 0m14.965s user 0m13.271s sys 0m0.300s Saving spreadsheet... real 0m14.765s user 0m13.321s sys 0m0.313s Saving spreadsheet... real 0m14.978s user 0m13.322s sys 0m0.305s Saving spreadsheet... real 0m14.824s user 0m13.335s sys 0m0.301s Saving spreadsheet... real 0m14.992s user 0m13.340s sys 0m0.277s ```
openpyxl ```bash $ for i in {1..5}; do time python spyglass.py -n united_calanworie --minor 3550 --major 5350 --dump; done Saving spreadsheet... real 0m18.845s user 0m16.993s sys 0m0.412s Saving spreadsheet... real 0m18.778s user 0m17.050s sys 0m0.344s Saving spreadsheet... real 0m18.720s user 0m16.860s sys 0m0.432s Saving spreadsheet... real 0m18.861s user 0m17.115s sys 0m0.393s Saving spreadsheet... real 0m18.759s user 0m16.834s sys 0m0.421s ```

I'd drag out my Windows laptop to run these tests on as well but I feel like that might be a little redundant. On higher-powered systems there's a 1.5-1.9 second speedup, which while not amazing is certainly better than no speedup at all. Where this PR shines though is lower end systems where CPU is going to be a bottleneck -- shaving up to four seconds off on the run with 4 CPU cores on a VM. It definitely makes the process more tolerable.

This is worth code review.

AavHRF commented 1 year ago

Per the notice given in v.3.0.1, this feature will only be available as a release candidate once merged, until the release of F/S.