BrianGladman / mpir

Multiple Precision Integers and Rationals
GNU General Public License v3.0
75 stars 36 forks source link

Fail silently on Windows after invalid operations #30

Open 5E-324 opened 1 year ago

5E-324 commented 1 year ago

MPIR handles invalid operations by raising a SIGFPE signal. On windows, this cause the program to exit silently. Visual studio will not break on such signal, even if everything in the exception settings are checked. However it will break when abort() is called, which appears right after raise (SIGFPE). Since the signal already terminates the program, the abort has no chance execute, so there will be no sign of what happened. A possible solution/workaround is to add a __debugbreak() (int 3 instruction) before raise when it's available.

BrianGladman commented 1 year ago

Hi, This long standing behaviour has been inherited from GMP so it is not specific to MPIR. if running under the Visual Studio debugger, what advantage would adding a coded break have over setting a breakpoint on line 72 in invalid.c?

5E-324 commented 1 year ago

setting a breakpoint

This is what I did after I find out what happened. The problem should be obvious, how can a user know what happened... without knowing what happened? Even if this can't be changed, it should at least be documented. And such error may not always be considered fatal, it shouldn't always cause the program to quit. Sometimes, having some invalid result is acceptable, or at least not as unacceptable as crashing. But it doesn't seem to be possible to change this behavior. When the dll version is used, the signal can't be handled by the program, and even if it can be handled, the abort will still terminate the program. Maybe it's better to do what said in the comments:

It might be possible to check whether a hardware "invalid operation" trap is enabled or not before raising a signal. This would require all callers to be prepared to continue with some bogus result. Bogus returns are bad, but presumably an application disabling the trap is prepared for that.

And surely, if arithmetic error are unacceptable, there shouldn't be NaNs to trigger this in the first place, because the operations that produces them should have already generated an exception or signal. For the C++ mp*_class, a better thing to do would be to throw an exception, just like the operations involving strings do. But that isn't the case. In any case, forcing the program to quit by calling abort even after the signal is handled and the program decides to do nothing doesn't seem like a good idea.

BrianGladman commented 1 year ago

setting a breakpoint

This is what I did after I find out what happened. The problem should be obvious, how can a user know what happened... without knowing what happened?

MPIR (and GMP) raise the SIGFPE signal so that the code that uses the library can write a signal handler for this exception. This handler has access to OS specific data structures that will include more details of the exception. SIGFPE is well documented but depends on the OS architecture on which the library is running.

Even if this can't be changed, it should at least be documented. And such error may not always be considered fatal, it shouldn't always cause the program to quit. Sometimes, having some invalid result is acceptable, or at least not as unacceptable as crashing. But it doesn't seem to be possible to change this behavior. When the dll version is used, the signal can't be handled by the program, and even if it can be handled, the abort will still terminate the program. Maybe it's better to do what said in the comments:

It might be possible to check whether a hardware "invalid operation" trap is enabled or not before raising a signal. This would require all callers to be prepared to continue with some bogus result. Bogus returns are bad, but presumably an application disabling the trap is prepared for that.

And surely, if arithmetic error are unacceptable, there shouldn't be NaNs to trigger this in the first place, because the operations that produces them should have already generated an exception or signal. For the C++ mp*_class, a better thing to do would be to throw an exception, just like the operations involving strings do. But that isn't the case. In any case, forcing the program to quit by calling abort even after the signal is handled and the program decides to do nothing doesn't seem like a good idea.

If the calling program does not handle the exception, there is nothing else that the library can sensibly do. Remember that MPIR and GMP are based on a design that is now two decades old. Much has happened since then and there is little doubt that an equivalent library designed today would be different.

You could ask the GMP project (from which MPIR is a fork) to consider the adoption of a better approach to exception handling. But I don't see much evidence to suggest that the current library approach to handling SIGFPE is so bad that such a proposal would receive significant support.

My advice would hence be that you should add a SIGFPE signal handler to your code that uses MPIR.

5E-324 commented 1 year ago

MPIR (and GMP) raise the SIGFPE signal so that the code that uses the library can write a signal handler for this exception. This handler has access to OS specific data structures that will include more details of the exception. SIGFPE is well documented but depends on the OS architecture on which the library is running.

I know that, but only after debugging in the dark. That's why I say it should be documented. I know that SIGFPE is well documented, but there's nowhere in the MPIR documentation that mentions SIGFPE, invalid.c, __gmp_invalid_operation or the way those related functions handle Inf or NaN.

I agree that it would be good to add some documentation for this. Sadly, however, MPIR documentation was produced on the Linux/GCC version of MPIR which is no longer being maintained so I now have no way of doing this on Windows.

If the calling program does not handle the exception, there is nothing else that the library can sensibly do. Remember that MPIR and GMP are based on a design that is now two decades old. Much has happened since then and there is little doubt that an equivalent library designed today would be different.

You could ask the GMP project (from which MPIR is a fork) to consider the adoption of a better approach to exception handling. But I don't see much evidence to suggest that the current library approach to handling SIGFPE is so bad that such a proposal would receive significant support.

My advice would hence be that you should add a SIGFPE signal handler to your code that uses MPIR.

As I said before, signal handlers set in the exe somehow cannot handle signals raised in mpir.dll, they can only handle signals raised by the exe itself. Since this is likely a Windows specific problem, it might not be very sensible to report it to the GMP project, which doesn't support Windows.

There does not appear to be much Windows information around on how to propagate signals raised in Dll's up to user code. It seems unlikely that this cannot be done but I certainly have no experience in doing this and cannot hence offer you any help here.

And even if the signals can be handled correctly, adding a signal handler still won't work, because as I said before, IT CALLS ABORT AFTER raise (SIGFPE) and there's no way to ignore that! I don't think calling abort() is a sensible thing to do if the calling program does handle the exception.

As I understand the principle, if the exception handler deals with the raised signal correctly without re-invoking the same exception again, the abort() call will never be executed.

In summary, it would certainly be nice to know how to handle DLL exceptions at user level but I don't have any experience of doing this and cannot hence offer any help. And, for reasons given above, I cannot now modify the existing MPIR documentation.

5E-324 commented 1 year ago

Why did you reply by editing? Did you mean quote?

Yes, if the signal is handled correctly then it won't be re-invoked. But the problem is, __gmp_invalid_operation is defined like this:

void
__gmp_invalid_operation (void)
{
  raise (SIGFPE);
  abort (); // <-- There's an abort here!
}

After the signal is handled, the raise function will return, and the execution resumes, then the abort() is executed. This is what doesn't make sense to me: why should it call abort even if the program choose to ignore the signal?

And after some testing, I found that the signal handler will take effect in another dll but not in mpir.dll. Not sure if this is because they are compiled using different compilers. (mpir.dll is compiled by msvc, while exe and the other dll are compiled by the clang provided by visual studio, which should be compatible to msvc.) But they use the same standard library, and everything else are fine. Anyway, I will try static linking later.