Qiskit / qiskit

Qiskit is an open-source SDK for working with quantum computers at the level of extended quantum circuits, operators, and primitives.
https://www.ibm.com/quantum/qiskit
Apache License 2.0
4.83k stars 2.29k forks source link

possible docstring bit-ordering error in `BitArray.from_bool_array()` #12677

Closed aeddins-ibm closed 3 days ago

aeddins-ibm commented 3 days ago

Environment

What is happening?

https://github.com/Qiskit/qiskit/blob/25c054251f50871ff9ad1dd5d3a7f2de2b2436fc/qiskit/primitives/containers/bit_array.py#L189-L198

Normally, I'd expect "big" would mean element 0 is the least significant bit, and element -1 (the end) is the most significant bit. So AFAICT this docstring has the two switched.

May be good to confirm that the output of from_bool_array() matches the expected endianness too.

How can we reproduce the issue?

See above

What should happen?

See above

Any suggestions?

No response

jakelishman commented 3 days ago

Strictly, this isn't "endianness", it's the bit-ordering convention (endianness is byte order). Fwiw (I'm not on the primitives team), I think that the terminology here corresponds to that of numpy.packbits.

aeddins-ibm commented 3 days ago

Thanks, I didn't know that. I had commonly heard the bit-ordering referred to as endianness (like here, here). I just edited the above to not use the word "endianness" and will try to clean up my terminology going forward.

Terminology aside, I think (?) there is still an error in the docstring with whether "big" or "small" would have the MSB at index array[..., 0].

aeddins-ibm commented 3 days ago

As Jake pointed out, the order here seems to match how numpy's packbits() orders bits. From the numpy docs:

image

If I'm reading this right, "big" here puts the MSB at position 0 (LSB at the end). So AFAICT order="big" has the opposite usage of big in "big endian". This seems like an unfortunate clash of conventions that will confuse people (clearly it confused me), but I see it's probably better to match the packbits convention here, so the docstring is then correct. Closing this.

Perhaps at some point we could update the kwarg to be less ambiguous, like MSBF = True/False for "most significant bit first".

jakelishman commented 3 days ago

Yeah, we've often (and probably still do...) use "endianness" to refer to bit-ordering conventions in Qiskit, but it's not strictly correct terminology to do so.

I believe the terminology here is still roughly consistent with the formal definition of endianness. Big endian stores the most-significant byte at the lowest memory address (offset 0), then the next most at position 1, etc. Little endian stores the least significant byte at memory offset 0 and so on.

Where this gets confusing is that within a byte, we always write bitstrings with the most-significant bit on the left (which is index 0 if we use a programming string that will print the way we read it), but our intuition from mathematical place value says that the "index" $n$ of each bit should refer to the value by $2^n$, which implies a bit-labelling with the least-significant bit labelled as 0 (so the right-most one). My rough feeling is that the difference between "how we read it" and "how we typically label things from left to right" is where people tend to mix the terms (I know I used to too).

aeddins-ibm commented 3 days ago

So practically, if bstring = '01011' is a key in a counts dictionary, then "little endian" (if we allow describing bit ordering this way) still means the little bit (LSB) is at the end of bstring (bstring[-1]).

But bstring is then understood to be reversed with respect to the order of the bits in each byte, i.e. the bit at memory address n in a byte corresponds to bstring[::-1][n] (assuming just 1 byte). And this matters for bitpacking.

Thanks for the explanations. Someday I will understand this with 0 sign errors, instead of just with an even number of sign errors...