GSS-Cogs / databaker

Command line tool to convert spreadsheets to databases, made for the UK's Office for National Statistics.
Other
1 stars 0 forks source link

offset range lookup ("WITHIN"), get cells that are both LEFT and RIGHT #18

Closed mikeAdamss closed 3 years ago

mikeAdamss commented 3 years ago

a common issue we have with databaker is that the api only supports absolute directions, so it can be DIRECTLY or CLOSEST in a given direction, but we have huge issues (and write a lot of nasty workarounds) for when the cells we want are in the middle (eg both LEFT and RIGHT, relative to different observations cells).

The solution would be a new keyword to search with an offset range.

Example:

HDim(cells, "Dimension1", WITHIN(-1, 1), ABOVE)

The -1 and 1 params above represent the offset to search, i.e one to the left through to one to the right, working upwards.

This pattern is illustrated, below, with the numbers showing the order which the lookup would search for a corresponding dimension value with these params.

A B C D E F G
7 8 9
4 5 6
1 2 3
observation

this would allow us to cleanly extract data from a table presenting the data with the following (and painfully common) pattern.

A B C D E F G
2014 2015
obs obs obs obs obs obs
mikeAdamss commented 3 years ago

pattern should probably be WITHIN(LEFT=1, RIGHT=1), much more readable

mikeAdamss commented 3 years ago

have got this largely written, with mainly tests, sanity checking and maybe a few days on tweaking performance required. Will create a pr to kick this task off.

mikeAdamss commented 3 years ago

done