bashtage / randomgen

Numpy-compatible bit generators and add some random variate distributions missing from NumPy.
Other
112 stars 25 forks source link

Non-random data is returned if any exceptions are raised while generating random bits #389

Closed cbryant203 closed 1 month ago

cbryant203 commented 1 month ago

If the procedure which generates random data raises an exception, this cause an error message to be printed, but this does not propagate, so the original function returns non-random data. Note that the exception does trigger this message:

Exception ignored on calling ctypes callback function: <cyfunction raw_64_to_double.<locals>.f at 0x7f4f426e1d80>

which pretty much explains why this happens. Attached is a trivial randomness generator which always raises an exception, and a tester which invokes it asking for 10 random values. It prints these values:

[6.89921293e-310 6.89921293e-310 6.89921293e-310 6.89921293e-310
 6.89921293e-310 6.89921293e-310 6.89921293e-310 6.89921293e-310
 6.89921293e-310 6.89921293e-310]

which are obviously not random!

dummydemo.txt dummyrand.txt

I ran this with numpy==2.1.1 randomgen==2.0.1 but it's not version-specific

bashtage commented 1 month ago

Once the PRNG is running there is no exception handling. This is done for performance since within the core PRNGs one doesn't expect to encounter actual exceptions (there is plenty around, e.g., constructing arrays to hold random values).

I would say this is by design. If you some reason you need to check for exceptions in the low level interface then you will need to call PyErr_Occurred and handle the exception yourself. There is a tiny bit of the in RDRAND which can raise errors if the bit pool available to the instruction is empty. Some explanation is here https://bashtage.github.io/randomgen/bit_generators/rdrand.html

bashtage commented 1 month ago

I've thought about this and I think the best that I can do is to make it explicit that user-code must not throw exceptions. This happens because these are all executed without GIL internally which is why it just runs non-stop without worrying about exceptions. This is a requirement of NumPy's Generator, and is practically required for performance reasons.

bashtage commented 1 month ago

One last bit of explanation.

  1. The array to store the data is created but empty.
  2. The generator calls that function which raises an exception but will always return the expected type.
  3. The generator takes this int64 and transforms it. This can fail in the case of an exception which can produce an infinite loop in some cases.