bamler-lab / constriction

Entropy coders for research and production in Python and Rust.
https://bamler-lab.github.io/constriction/
Apache License 2.0
80 stars 6 forks source link

Vectorize cdf query of custom model #31

Open wildug opened 1 year ago

wildug commented 1 year ago

In my usecase I want to compress a large amount of data with a custom entropy model. Unfortunately this takes quite some time since for each compressed symbol the cdf is called. I can't straight up use the scipy model adapter since I'm using a mixture distribution which is not implemented in scipy.

Here's my dummy code:

from scipy import stats
import constriction
import numpy as np

c = 0
def cdf_likelihood_normal(x, mu, sigma):
    global c
    c += 1
    print(c, end="\r")
    p =  stats.norm.cdf(x, loc=mu, scale=sigma )
    return p

def inverse_cdf_likelihood_normal(q, mu, sigma):
    x = stats.norm.ppf(q, loc=mu, scale = sigma)
    return x

coder = constriction.stream.stack.AnsCoder()
entropy_model = constriction.stream.model.CustomModel(cdf_likelihood_normal, inverse_cdf_likelihood_normal, -10, 10)

sigma =  np.ones(int(1e4))
mu    = np.zeros(int(1e4))
message = np.random.randint(-1,1,int(1e4),dtype=np.int32)

p = stats.norm.cdf(message, loc=mu, scale=sigma) # very fast

coder.encode_reverse(message, entropy_model,  mu, sigma) # very slow
print(coder.num_bits())

reconstruction = coder.decode(entropy_model, mu,sigma)

assert (message == reconstruction).all()

Is it possible to take care of vectorizable cdfs in the custom model adapter to allow for a speed up?

robamler commented 1 year ago

I can see how vectorizing would reduce overhead from python callbacks. Unfortunately, vectorizing is only possible for encoding; when decoding a symbol, the decoder cannot know where to evaluate the ppf before it has decoded all preceding symbols (except in case of the ChainCoder). I'll have to think a bit what the best API would be to reflect this asymmetry (and to ideally still support vectorization for decoding with a ChainCoder).