`BitArray.from_samples/from_counts` fails with `num_bits=None` for str outcomes and 0 outcome.

chriseclectic commented 3 months ago

Environment

Qiskit version: 1.1
Python version: 3.11
Operating system: MacOS

What is happening?

Calling BitArray.from_counts or BitArray.from_samples without specifying the number of bits is supposed to infer the number of bits from the input data, however this currently only seems to work for integer keys > 0.

How can we reproduce the issue?

BitArray.from_samples([s]) or BitArray.from_counts({s: shots})

errors with

File ~/mambaforge/envs/qiskit1/lib/python3.11/site-packages/qiskit/primitives/containers/bit_array.py:305, in BitArray.from_samples(samples, num_bits)
    303 data = b"".join(val.to_bytes(num_bytes, "big") for val in ints)
    304 array = np.frombuffer(data, dtype=np.uint8, count=len(data))
--> 305 return BitArray(array.reshape(-1, num_bytes), num_bits)

ValueError: cannot reshape array of size 0 into shape (0)

for s = 0 or a string, eg s = "000".

Calling BitArray.from_counts ultimately ends up calling BitArray.from_samples and returns the same error.

What should happen?

The number of bits should be inferred correctly for bitstring dicts, or for trivial case of {0: shots} (which should default to 1 bit)

Any suggestions?

No response

aeddins-ibm commented 3 months ago

If I understand correctly, there are two issues:

enforcing the convention that 1 bit is needed to represent 0 (seems straightforward)
inferring num_bits from the bitstring lengths (less clear)

Currently there are unit tests requiring that from_samples runs even when given bitstrings with different numbers of bits:

https://github.com/Qiskit/qiskit/blob/ec1f42cd785805264b86445ac92bf2a22f10bf92/test/python/primitives/containers/test_bit_array.py#L241-L247

I think this means that to infer num_bits from the bitstring key lengths, we have to check each key length individually and take the maximum, or else drop support for this case of inconsistent bitstring lengths (so we can just check the length of the first key).

Quick test on laptop suggests for 10^8 bitstrings with up to 1000 bits, max(map(len, samples)) takes ~15-20 seconds.

Since checking all the key lengths seems slow, I would rather remove support for inputs containing bitstrings of inconsistent lengths. Is it safe to remove, or is there an actual use case for this / are we obligated to continue supporting it?

t-imamichi commented 3 months ago

Can I close this issue because #12800 is merged or anything left to do?

aeddins-ibm commented 3 months ago

What should happen?

The number of bits should be inferred correctly for bitstring dicts

What is the "correct" number of bits to infer from bitstring inputs? Is it based on string length, or on the values converted to integers?

E.g. for {"000":2, "001":3}, should num_bits be 3 or 1?

The latter (1) is the current behavior. If that is "correct" we can close the issue. If not, we should leave it open and look for a way to infer num_bits from string length without too many breaking changes.

t-imamichi commented 3 months ago

@ihincks What do you think about the behavior? Should we update it?

Qiskit / qiskit