We're basically replacing the generic (and terribly scaling) cell lookups with a much faster lookup type based "engine" (so a Directly Engine, a Closest Engine, a Constant Engine) where each "engine" is a class optimised for performance (and scaling) within its given niche.
Rough performance comparisons follow (I say rough because real numbers will vary based on structure, but comparative gains should be along these sorta lines).
rows in sheet
. our current branch
refactor (this pr)
loading the "tabs" into databaker
. 60,000
00:02:02
00:00:26
00:00:30
125,000
00:06:43
00:00:51
00:01:05
250,000
00:26:27
00:01:48
00:02:07
500,000
02:17:41
00:03:54
00:04:20
. 1,000,000
19:00:00 (gave up at)
00:08:18
00:09:40
Also added a bunch of tests, took out some kruft and added a few friendlier exceptions.
Rewrites the databaker lookup functionality.
We're basically replacing the generic (and terribly scaling) cell lookups with a much faster lookup type based "engine" (so a Directly Engine, a Closest Engine, a Constant Engine) where each "engine" is a class optimised for performance (and scaling) within its given niche.
Rough performance comparisons follow (I say rough because real numbers will vary based on structure, but comparative gains should be along these sorta lines).
Also added a bunch of tests, took out some kruft and added a few friendlier exceptions.