begriffs / pg_rational

Precise fractional arithmetic for PostgreSQL
MIT License
238 stars 14 forks source link

Pathological case #5

Closed gabegorelick closed 5 years ago

gabegorelick commented 6 years ago

The Hacker News discussion on your blog post has some interesting points, particularly this comment which identifies a pathological case when taking mediants:

So the mediant is better [than averages] because the precision requirements is limited by the number of times you can add instead of multiply. This might seem to grow a lot slower at first. And the author even showed that if you're repeatedly averaging 0/1 with the new score, you would get the pattern 1/2, 1/3, 1/4, 1/5, 1/6, .... Which means you can handle up 2^precision number of inserts which is much better than before.

But this only applies to inserting to front (medianting with 0/1) or back (medianting with 1/1) because this will only add a 0 or 1 each time.

If you go down a zigzag path down the middle of the tree instead, you can get the two numbers added to be much closer in magnitude and explode. For example: 1/1, 1/2, 2/3, 3/5, 5/8, 8/13, 13/21, 21/34, ... (it's fibonacci!).

So essentially you get the same thing as the averaging case where the precision required roughly doubles(1.6180339887...) at each step and it will overflow quickly.

This isn't a purely theoretical edge case either. The example from before can be realized by repeatedly inserting between the last two items inserted, which arises naturally if you're repeatedly adding to the middle of a list.

begriffs commented 6 years ago

That's a really good observation. I'll update the article to include that, and credit the person who brought it up.

I don't know how to get around the pathological case, I guess 64 bits can only record so much, and the method of selecting mediants as midpoints prioritizes the potential lengths of some paths over others. Hopefully the case of continued insertion on just one side is more common than going LRLRLR down the tree.

gabegorelick commented 6 years ago

Would renormalization help?

begriffs commented 6 years ago

The fraction arithmetic functions do detect overflow and raise ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE. The code calling rational_intermediate could catch that error, renormalize and try again. The renormalization is interesting because I don't know if it's one-size-fits-all. Like maybe a table is sorted, but the order applies per-category of thing. So the normalization would need to know to apply to just rows in a category? Or maybe I'm overthinking it.