Be-Prepared / Be-Prepared.github.io

Handy PWA filled with useful tools.
Other
0 stars 0 forks source link

Store cities in text, use less space #35

Closed fidian closed 3 months ago

fidian commented 3 months ago

Right now, cities are a big object in JSON. A sample city would look like this (37 bytes):

"New York City":[40.71427,-74.00597],

If we switched to a single line format, used spaces or tabs as field delimiters and newlines as record delimiters, it could look like this (29 bytes with newline):

87G7PX7V+PJ2QX New York City

Eliminating the field separator and the plus would reduce this to 26 bytes.

Using a custom coding scheme, allowing for 64 printable characters for the coordinates and the city name, it could get the same resolution like this:

  1. 012..789abc...xyzABC...XYZ-_ = values from 0 to 63
  2. Force longitude and latitude to both be positive numbers
  3. Convert range into [0,1) by dividing by the longitude or latitude by 360° (keeping it the same for both means the formula is simpler)
  4. Multiply by 64. Take Math.floor(result) to get the index into the string, then remove that from the result.
  5. Repeat the last step enough times until you get high enough accuracy.

For the city database, ranges go to 0.00001°. That equates to 5 digits (with plenty of accuracy left over) or 4 digits with the loss of accuracy of about 0.0000214°, or 8 feet at the worst case. These lines would look like this (22 bytes with newline):

nffeOR-vNew York City

With this arrangement, math is simple and the encoding is fast. Using base96 would provide accuracy slightly beyond 0.00001° with the same number of digits. The code would have to fetch the text file and parse the result instead of receiving it as JSON, but that is fairly easy to overcome. With this in place, the file should be reduced to 178,711 bytes, meaning it is only 56% the original size and saving 139,401 bytes. Additional bytes could be saved by eliminating the duplicate entries for a single city when the city has accent characters, such as "Landstraße" and "Landstrasse" by combining them into "abcd1234Landstraße|Landstrasse" (the abcd1234 are fictitious placeholder values for the coordinates).

fidian commented 3 months ago

Closed with a6f234aab3c02c92e4b0f13c770b7dfc27be8c1a