Any plan to optimize the decoding time?

Yes, that definitely needs to be optimized for practical purposes. Do you have a detailed profile of the execution times for the decoding process?

The model architecture should be reasonably well optimized already, assuming you are running on GPU. Using torchscript to JIT functions in the forward pass should result in a small improvement.

The bottleneck probably lies in the actual entropy coding/decoding process. The current implementation is a vectorized rANS encoder written in numpy - which also has a small bit overhead in addition to being relatively slow, as the vectorized 'heads' must be initialized to some default value - which takes extra bits to store. Rewriting this in a lower-level language (which TF Compression and Fabian Mentzer's torchac do) would definitely improve encoding/decoding times significantly. This is something I'd like to get working eventually if I can find the time to.

Justin-Tan / high-fidelity-generative-compression

Any plan to optimize the decoding time? #9