bamler-lab / constriction

Entropy coders for research and production in Python and Rust.
https://bamler-lab.github.io/constriction/
Apache License 2.0
77 stars 5 forks source link

assertions not supported for parameter cdf in constriction.stream.model.CustomModel #27

Closed wildug closed 11 months ago

wildug commented 1 year ago

I tried to implement a custom entropy model that is only defined on positive numbers. To make sure only positive numbers are encoded I used an assert statement to check for non-negativity. To make my point I use a lognormal distribution in the following snippet.

def cdf(x, mu, sigma):
     assert x >= 0
     return stats.lognorm.cdf(x, mu, sigma)

def inverse_cdf(q, mu, sigma):
     return stats.lognorm.ppf(q, mu, sigma)

mus = np.random.randn(100)
sigmas = np.random.randn(100)**2 +1
dummy_entropy_model = constriction.stream.model.CustomModel(cdf ,inverse_cdf, -10, 10)

message = (np.random.randn(100)**2).round().astype(np.int32)

coder = constriction.stream.stack.AnsCoder()
coder.encode_reverse(message, dummy_entropy_model, mus, sigmas)
decode = coder.decode(dummy_entropy_model, mus, sigmas)

thread '<unnamed>' panicked at 'TODO: PyErr { type: <class 'AssertionError'>, value: AssertionError(), traceback: Some(<traceback object at 0x7f22f0efac80>) }', src/pybindings/stream/model/internals.rs:380:14

PanicException: TODO: PyErr { type: , value: AssertionError(), traceback: Some() }

This problem is of course solved by deleting assertions from cdf.

A more permanent solution might be a change to the documentation or the error message.

robamler commented 11 months ago

Thanks for reporting! Please note that there's a bug in the following line:

dummy_entropy_model = constriction.stream.model.CustomModel(cdf ,inverse_cdf, -10, 10)

The argument -10 states that the CustomModel should support symbols from the integer range {-10, -9, ..., 10}. Therefore, constriction correctly assumes that it should be allowed to evaluate the cdf at negative values.

Changing the argument from -10 to 0 fixes the encoding part, but now decoding fails. This is still technically correct behavior as CustomModel does not guarantee to only probe the cdf within the range of supported symbols (being allowed to probe outside the range of supported symbols slightly simplifies the code that calculates the exact inverse cdf). But looking at this example, I now agree that this can cause confusion, and I'll look into fixing it. Also, I agree that the error messages should be better.

Side Node

Assertions generally can't be used to tell an interpreter or compiler not to do something. They are a tool for (i) documenting invariants for human readers of the source code and for (ii) bailing in case a program or library contains a logic error or is used incorrectly.

This is nothing specific about constriction (or even about python). For example, the code below throws an AssertionError because it calls divide with argument b=0 at some point even though the assert statement inside divide states otherwise. This is because an assert statement never enforces the asserted condition. It just checks if the condition is met, and interrupts control flow if it isn't.

def divide(a, b):
    assert b != 0
    return a / b

print([divide(1, x) for x in range(-10, 11)])

Preventing callers from calling a function unless certain conditions are met cannot be achieved with assertions inside the function body. It would instead require either a more expressive type system that makes invalid function arguments inexpressible, or a more elaborate mechanism such as design by contract.