bamler-lab / constriction

Entropy coders for research and production in Python and Rust.
https://bamler-lab.github.io/constriction/
Apache License 2.0
77 stars 5 forks source link

Different behavior on amd64 and arm #38

Closed actcwlf closed 7 months ago

actcwlf commented 7 months ago

Hi, I'm trying to use this package on arm devices, but I found inconsistent behavior on arm and amd64 platforms, what's the reason for this?

This is the test code

import constriction
import numpy as np

message = np.array([10, 10], dtype=np.int32)

entropy_model = constriction.stream.model.QuantizedLaplace(
            -20, 20 + 1
        )

encoder = constriction.stream.queue.RangeEncoder()
encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0.]))

compressed = encoder.get_compressed()
print(f"compressed representation: {compressed}")
print(f"(in binary: {[bin(word) for word in compressed]})")

decoder = constriction.stream.queue.RangeDecoder(compressed)
decoded = decoder.decode(entropy_model, np.array([10., 10.]), np.array([10.,  0.])) 
assert np.all(decoded == message) # (verifies correctness)

On amd64

compressed representation: [2042752375]
(in binary: ['0b1111001110000011110110101110111'])

On arm

thread '<unnamed>' panicked at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9:
assertion failed: b > 0.0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0

The full traceback is

thread '<unnamed>' panicked at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9:
assertion failed: b > 0.0
stack backtrace:
   0:       0x7f9d20a7ac - std::backtrace_rs::backtrace::libunwind::trace::heaab0e590535aeb3
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:       0x7f9d20a7ac - std::backtrace_rs::backtrace::trace_unsynchronized::h89cc7ae9ebb707d7
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:       0x7f9d20a7ac - std::sys_common::backtrace::_print_fmt::h08c31be18fedf422
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:67:5
   3:       0x7f9d20a7ac - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc38bcf44d9e857e3
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:44:22
   4:       0x7f9d226b04 - core::fmt::rt::Argument::fmt::ha5b752f9cd7ef4a3
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/fmt/rt.rs:138:9
   5:       0x7f9d226b04 - core::fmt::write::h9fac187ae7486f3c
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/fmt/mod.rs:1094:21
   6:       0x7f9d208358 - std::io::Write::write_fmt::h239e9fb6296b3a7f
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/mod.rs:1714:15
   7:       0x7f9d20a5e0 - std::sys_common::backtrace::_print::h52f67cfa8753b0ab
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:47:5
   8:       0x7f9d20a5e0 - std::sys_common::backtrace::print::hdea7481e2c957a93
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:34:9
   9:       0x7f9d20b958 - std::panicking::default_hook::{{closure}}::h7c36fa733369c49e
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:270:22
  10:       0x7f9d20b680 - std::panicking::default_hook::h303eee75f9a8f6a8
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:290:9
  11:       0x7f9d20bf1c - std::panicking::rust_panic_with_hook::h270c94381ec34744
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:707:13
  12:       0x7f9d20bda4 - std::panicking::begin_panic_handler::{{closure}}::h3653e3502bcc1625
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:597:13
  13:       0x7f9d20ac90 - std::sys_common::backtrace::__rust_end_short_backtrace::h6b8510f2f024eeeb
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:170:18
  14:       0x7f9d20bb34 - rust_begin_unwind
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
  15:       0x7f9d0fe4a4 - core::panicking::panic_fmt::ha96945d7a1b20293
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
  16:       0x7f9d0fe514 - core::panicking::panic::h8f06a2df29fa4962
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:117:5
  17:       0x7f9d11f308 - probability::distribution::laplace::Laplace::new::haefeb509845cc871
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9
  18:       0x7f9d119a44 - constriction::pybindings::stream::model::QuantizedLaplace::new::{{closure}}::h5e7236efc8092f4a
                               at /home/user/repos/constriction/src/pybindings/stream/model.rs:544:44
  19:       0x7f9d161b7c - <constriction::pybindings::stream::model::internals::ParameterizableModel<(P0,P1),M,F> as constriction::pybindings::stream::model::internals::Model>::parameterize::h135fd1ad8427458f
                               at /home/user/repos/constriction/src/pybindings/stream/model/internals.rs:249:35
  20:       0x7f9d17476c - constriction::pybindings::stream::queue::RangeEncoder::encode::h7a9429b10f1006e7
                               at /home/user/repos/constriction/src/pybindings/stream/queue.rs:314:13
  21:       0x7f9d175934 - constriction::pybindings::stream::queue::_::<impl constriction::pybindings::stream::queue::RangeEncoder>::__pymethod_encode__::hb8e05b89a123d73c
                               at /home/user/repos/constriction/src/pybindings/stream/queue.rs:43:1
  22:       0x7f9d0ff204 - pyo3::impl_::trampoline::fastcall_with_keywords::{{closure}}::he0112aa3b1e49083
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:41:29
  23:       0x7f9d0ff10c - pyo3::impl_::trampoline::trampoline::{{closure}}::h9f46c71dbb4a44d4
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:181:54
  24:       0x7f9d15b494 - std::panicking::try::do_call::hd5234df960a8f95d
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:502:40
  25:       0x7f9d15b720 - __rust_try
  26:       0x7f9d15b268 - std::panicking::try::haeac5d692bf46066
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:466:19
  27:       0x7f9d115570 - std::panic::catch_unwind::h2b95655acc141454
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panic.rs:142:14
  28:       0x7f9d0fed68 - pyo3::impl_::trampoline::trampoline::h82d7e8a0b17fbbdf
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:181:9
  29:       0x7f9d140a94 - pyo3::impl_::trampoline::fastcall_with_keywords::h45c8d979f3ca1f26
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:41:13
  30:       0x7f9d174a48 - constriction::pybindings::stream::queue::_::<impl pyo3::impl_::pyclass::PyMethods<constriction::pybindings::stream::queue::RangeEncoder> for pyo3::impl_::pyclass::PyClassImplCollector<constriction::pybindings::stream::queue::RangeEncoder>>::py_methods::ITEMS::trampoline::h905700428ec8b158
                               at /home/user/repos/constriction/src/pybindings/stream/queue.rs:43:1
  31:           0x488db8 - _PyMethodDescr_FastCallKeywords
  32:           0x512714 - _PyEval_EvalFrameDefault
  33:           0x50b47c - _PyEval_EvalCodeWithName
  34:           0x50dd70 - PyEval_EvalCode
  35:           0x616f84 - <unknown>
  36:           0x617070 - PyRun_FileExFlags
  37:           0x61844c - PyRun_SimpleFileExFlags
  38:           0x6494c8 - <unknown>
  39:           0x649b00 - _Py_UnixMain
  40:       0x7f9db946e0 - __libc_start_main
                               at /build/glibc-4fr630/glibc-2.27/csu/../csu/libc-start.c:310
  41:           0x5aef1c - <unknown>
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0

Thanks!

robamler commented 7 months ago

Thank you for reporting! On arm, are you using the precompiled constriction package from pypi (for new Macs with arm64 chips) or did you compile constriction from source? If you compiled from source, did you compile in --release mode? The error message that you see should appear in a debug build but it should not appear in a release build (and the official packages on pypi should all be release builds unless I messed something up in the CI).

In case you indeed compiled from source: I'd be happy to provide precompiled python packages for your platform going forward, but I'd need someone to test them, at least initially. Would you mind telling me what platform you're developing for and would you consider testing a precompiled python package on it?

Background

The second symbol in your example is encoded with a Laplace distribution with scale parameter b = 0.0, which is not a well-defined distribution (the pdf of a Laplace distribution is $p(x\mid\mu,b)=\frac {1}{2b}e^{-|x-\mu |/b}$, so $b=0$ would lead to a division by zero). The Laplace distribution is implemented in the probability crate, which checks for b > 0.0 using a should! macro, which is defined to be equivalent to debug_assert!. Unless special compiler flags are used, debug_assert! only checks in debug builds and is ignored in release builds.

Notes to Self

actcwlf commented 7 months ago

Thanks for your reply!

I compiled from source following

compile in release mode (i.e., with optimizations) and run the benchmarks: cargo bench

and

build the Python module: poetry run maturin develop --features pybindings

Because I am pretty new to rust, I am not very sure whether I compiled the package in release mode at that moment.

Unfortunately I dont have the original device to test the solution now. If the behavior is reproduciable on other platform in debug mode, I think this may be the main reason.

By the way, when $b=0$, can I suppose the decode result is $\mu$ ? Is this a correct workaround?

actcwlf commented 7 months ago

I make another test on amd64

import constriction
import numpy as np

message = np.array([10, 5, 6], dtype=np.int32)

entropy_model = constriction.stream.model.QuantizedLaplace(
            -20, 20 + 1
        )

encoder = constriction.stream.queue.RangeEncoder()
encoder.encode(message, entropy_model, np.array([10., 10., 10.]), np.array([10., 0., 0.]))

compressed = encoder.get_compressed()
print(f"compressed representation: {compressed}")
print(f"(in binary: {[bin(word) for word in compressed]})")

decoder = constriction.stream.queue.RangeDecoder(compressed)
decoded = decoder.decode(entropy_model, np.array([10., 10., 10.]), np.array([10.,  0., 0.]))

print(decoded)
assert np.all(decoded == message) # (verifies correctness)

the result is

compressed representation: [2042752312  556892819]
(in binary: ['0b1111001110000011110110100111000', '0b100001001100011000001010010011'])
[10  5  6]

May be there is some fallback when $b=0$?

robamler commented 7 months ago

You are right, leakily quantized Laplace distributions with $b=0$ do actually work in release builds. I can't reproduce my tests where these failed, maybe I accidentally tested in debug mode myself.

I compiled from source following [...]

build the Python module: poetry run maturin develop --features pybindings

Because I am pretty new to rust, I am not very sure whether I compiled the package in release mode at that moment.

This compiles in debug mode (i.e., it will result in a python package that contains more checks and fewer run-time optimizations). This explains the difference in behavior between your two builds.

(For completeness, if you want to compile in release mode, use: poetry run maturin develop --features pybindings --release; I updated the readme accordingly.)

I'm a bit unsure what the conclusion should be. On the one hand, one could interpret a Laplace distribution with $b=0$ as a delta distribution, and one could argue that a delta distribution should work with our leaky quantizer in most cases (see below) because the quantizer integrates the delta distribution over a finite interval, and the leakiness adds some small nonzero probability mass even if the integral does not contain the delta peak. And it indeed seems to work in experiments.

On the other hand, there are two reasons against allowing $b=0$ for Laplace distributions:

Request for feedback

To help me decide, could you give me some more background on why you tested with a Laplace distribution with $b=0$? Did this come up in a real use case or were you deliberately coming up with edge cases for stress testing? In other words, would it suffice for your use case if constriction enforced $b>0$ and if this requirement was documented in the python API reference, or would it make your life easier if $b=0$ was accepted and interpreted as a delta distribution?

actcwlf commented 7 months ago

Thanks for your explanations! I am doing some research on Cool-Chic and encountered the case $b=0$. For my use case, interpreting a Laplace distribution as a Delta distribution is useful when $b=0$. But I think these two reasons against allowing $b=0$ are reasonable. Perhaps adding some options is a good alternative? For example, add a strict=true option in constriction.stream.model.QuantizedLaplace to enforce the check that $b>0$ and strict=false for more flexible interpretation.

robamler commented 7 months ago

Thanks for the background information. I like the idea about a strict=true argument to QuantizedLaplace, but I decided against it for the following reasons (this list is not meant as criticism, it's just to document and communicate my reasoning):

Proposed Solution

The problem of $b=0$ can be easily avoided by adding some small regularization (e.g., 1e-16) to the scale parameter. I added this recommendation to the python API documentation (e.g., here), and I added unit tests to assert that quantizing continuous distributions with even much smaller scales behaves as expected without any numerical issues.

Regarding Precompiled Packages for arm

Unrelated to the specific issue reported here, please let me know if you'd be willing to test a precompiled python package for your computer architecture so I can upload official pip packages to pypi going forward. This would make things easier also for other people with your compute architecture. I'd need to know your host triple to do this (i.e., what rustc --version --verbose says after "host").

actcwlf commented 7 months ago

Thanks for your reply! Adding a small regularization is a good idea. As in previous comment, I don't have access to that device for now. If I get more detailed information, I will update the information here.

actcwlf commented 2 months ago

Hi, I have got the original device, and I would like to know if you can provide a precompiled version now. I would be happy to do some testing. The system info is

rustc 1.77.1 (7cf61ebde 2024-03-27)
binary: rustc
commit-hash: 7cf61ebde7b22796c69757901dd346d0fe70bd97
commit-date: 2024-03-27
host: aarch64-unknown-linux-gnu
release: 1.77.1
LLVM version: 17.0.6

The device is a Jetson nano and the recommanded version of python is 3.6. I found release v0.2.4 support 3.6 but failed to install it. Is it feasible to compile a newer version to the platform? Thanks in advance!