begriffs / pg_rational

Precise fractional arithmetic for PostgreSQL
MIT License
233 stars 14 forks source link

Add missing overflow checks to rational_in_float() #12

Closed adegert closed 4 years ago

adegert commented 4 years ago

Overflow leads to wrong results and can create non-terminating loops in the postgresql backend.

Changes:

begriffs commented 4 years ago

Thanks! I verified that your new tests do cause make installcheck to hang unless I also apply your change to the code.

adegert commented 4 years ago

Am Sa., 16. Nov. 2019 um 16:51 Uhr schrieb Joe Nelson < notifications@github.com>:

@begriffs commented on this pull request.

In pg_rational.c https://github.com/begriffs/pg_rational/pull/12#discussion_r347096574:

result->denom = 1;

do { z = 1.0 / (z - floor(z));

  • temp = result->denom;

Now that we no longer use the temp variable, its declaration can be removed.

yes, with the last round of changes the variable was gone but I forgot to remove it...

adegert commented 4 years ago

Am Sa., 16. Nov. 2019 um 17:21 Uhr schrieb Joe Nelson < notifications@github.com>:

@begriffs commented on this pull request.

In pg_rational.c https://github.com/begriffs/pg_rational/pull/12#discussion_r347097631:

{
  • result->numer = floor(target);
  • result->numer = (int32)target;

Wondering if the explicit cast is required. Since numer was declared as int32 will the assignment automatically do the conversion anyway?

It's not required, taking the floor and casting to int32 both happens implicitly with the assignment. But the cast is the problematic one because one has to watch for overflow. So I thought it's a good idea to make all casts explicit, and obviously the cast implies a floor operation.

adegert commented 4 years ago

Am Sa., 16. Nov. 2019 um 17:30 Uhr schrieb Joe Nelson < notifications@github.com>:

@begriffs commented on this pull request.

In pg_rational.c https://github.com/begriffs/pg_rational/pull/12#discussion_r347097970:

result->denom = 1;

do { z = 1.0 / (z - floor(z));

  • temp = result->denom;
  • result->denom = result->denom * floor(z) + prev_denom;
  • prev_denom = temp;
  • result->numer = round(target * result->denom);
  • fdenom = result->denom * floor(z) + prev_denom;
  • if (fdenom > INT32_MAX) {

What if you did the assignments together, and checked the ranges all at once, like

fdenom = result->denom floor(z) + prev_denom; fnumer = round(target fdenom);if (fnumer > INT32_MAX || fdenom > INT32_MAX) break;

Yes, its a matter of taste, I checked for overflow directly after the operation, but this version is more compact any maybe more readable. If you like I change it in the pull request.

Also when we break out of the loop because one of those are too big, should that be considered an error? Looks like it currently just proceeds to return the result.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/begriffs/pg_rational/pull/12?email_source=notifications&email_token=ABBZTMLW4WHKZYRTZKEZT3TQUANYZA5CNFSM4JOEKXXKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCLZ7LTI#pullrequestreview-317978061, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBZTMKDHUMAXG26D57KK2TQUANYZANCNFSM4JOEKXXA .

adegert commented 4 years ago

Am Sa., 16. Nov. 2019 um 17:30 Uhr schrieb Joe Nelson < notifications@github.com>:

@begriffs commented on this pull request.

In pg_rational.c https://github.com/begriffs/pg_rational/pull/12#discussion_r347097970:

result->denom = 1;

do { z = 1.0 / (z - floor(z));

  • temp = result->denom;
  • result->denom = result->denom * floor(z) + prev_denom;
  • prev_denom = temp;
  • result->numer = round(target * result->denom);
  • fdenom = result->denom * floor(z) + prev_denom;
  • if (fdenom > INT32_MAX) {

What if you did the assignments together, and checked the ranges all at once, like

fdenom = result->denom floor(z) + prev_denom; fnumer = round(target fdenom);if (fnumer > INT32_MAX || fdenom > INT32_MAX) break;

Also when we break out of the loop because one of those are too big, should that be considered an error? Looks like it currently just proceeds to return the result.

Sorry, forgot to comment this in the last email. No, it's not an error, because each loop just refines the result. So, if we break out of the loop, we got (approximately) the best result we can do with 32 bit ints. That's the reason I added an initialization for the numerator, because the loop might be skipped altogether. This happens in one of the test cases (select 1.0000000001::float::rational), where the result is 1/1. Or for all cases where the distance of the float to the closest int is so small that the next best approximation overflows int32... then the result is the closes int as x/1 rational.

Btw. I think the algorithm used gives a good result, but not necessarily the best one. For the best approximation one would have to calculate the result below the target and the one above, and then take the closest one, or the one with the smaller denominator if both distances are equal (instead of just using round()). But this would make some numerical differences to the current version, so its the question if its worth it (and it would take slightly more CPU cycles..). Might just be a point If you plan to submit the extension for inclusion into postgres :-)

Thanks a lot for writing this extension.

begriffs commented 4 years ago

Also when we break out of the loop because one of those are too big, should that be considered an error?

Sorry, forgot to comment this in the last email. No, it's not an error, because each loop just refines the result.

Makes sense. How big can the error hypothetically get? Could it ever be so bad that we should consider even our best possible result as not good enough?

Btw. I think the algorithm used gives a good result, but not necessarily the best one. For the best approximation one would have to calculate the result below the target and the one above, and then take the closest one, or the one with the smaller denominator if both distances are equal (instead of just using round()).

Thanks for the suggestion.

Might just be a point If you plan to submit the extension for inclusion into postgres :-)

It's encouraging that you say this, I've actually been preparing a patch for postgres to add a rational type. I don't know whether people will like it or whether they'll think it is too much extra complexity. We'll find out. :)

Thanks a lot for writing this extension.

Thanks for fixing this rather serious error that can lock up the backend.

adegert commented 4 years ago

Am So., 17. Nov. 2019 um 23:18 Uhr schrieb Joe Nelson < notifications@github.com>:

Also when we break out of the loop because one of those are too big, should that be considered an error?

Sorry, forgot to comment this in the last email. No, it's not an error, because each loop just refines the result.

Makes sense. How big can the error hypothetically get? Could it ever be so bad that we should consider even our best possible result as not good enough?

It depends a bit on how you define the error term. If you take the value of abs(target - p/q), for small values an upper limit would be 1/(2*INT32_MAX), because that's half of the step size you get with that denominator (below I use p for numerator and q for denominator). That is a small absolute error, but might of cause be a large relative error when target is of the same magnitude (but we can't represent arbitrary small number with this kind of fraction). For large target values (near INT32_MAX), the upper limit is 1/2, but then this is a small relative error. But I think this is fine because we give the best approximation that is possible in the representation (fraction with 2 32bit ints). Btw. you could define the denominator as unsigned then you have one bit more but that might produce more little problems in the implementation that it's worth.

The theory of continued fractions states that for an approximation p/q you generally get an error somewhere below 1/(q*q), which will be a much lower error limit in most cases, but we don't have a fixed q and only know it can't grow larger than INT32_MAX. (You can find more about continued fractions in Wikipedia, but the German page seems to be a lot better than the English one, so it might be better to look for some other paper or book if you are interested in that.)

Anyway I have checked some other algorithms (or implementations) and I think the one which works best (from those I looked at) is one that is used in the Python fractions module. It gives optimal results by using the continued fraction or a secondary convergent; the difference is, the continued fraction gives you the best result with that value of q, and the next q value in the sequence being > INT32_MAX, but you can construct a fraction in between with a q closer to INT32_MAX that might give a better approximation (even this it's not as optimal in the sense of especially low error for that value of q).

I have checked the algorithms in Python and now I'm transcribing it into C, might take another day.

Btw. I think the algorithm used gives a good result, but not necessarily the best one. For the best approximation one would have to calculate the result below the target and the one above, and then take the closest one, or the one with the smaller denominator if both distances are equal (instead of just using round()).

Thanks for the suggestion.

Might just be a point If you plan to submit the extension for inclusion into postgres :-)

It's encouraging that you say this, I've actually been preparing a patch for postgres to add a rational type. I don't know whether people will like it or whether they'll think it is too much extra complexity. We'll find out. :)

Thanks a lot for writing this extension.

Thanks for fixing this rather serious error that can lock up the backend.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/begriffs/pg_rational/pull/12?email_source=notifications&email_token=ABBZTMNJ26VYZUIKO3JCVODQUG7KFA5CNFSM4JOEKXXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEIXWHY#issuecomment-554793759, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBZTMJPF5JKOSFHIWPIUNLQUG7KFANCNFSM4JOEKXXA .