Closed leofang closed 1 year ago
Close #75. Close #76. Supersedes #76.
It turns out we all forgot @alonre24 has already pushed a PR (#76), so all I did was minor edits in the docstrings and sync with the latest master, with his commit preserved.
cc: @tqchen
LGTM Glad that this suggestion has finally approved :)
@tqchen shall we merge and make a new release (before #113 is merged)?
@tqchen was it correct that if we use the new Bool dtype with a size of 1-bit, than the storage would be assumed to actually be packed into a single bit? Or was the intention that bit-size is padded to full bytes?!
There is some need for passing bit-masks around in the dataframe community, so it would be good to clarify this use-case, if valid.
in this case i think it should be ideally represented as Bool(bit=1, lanes=32)
, which represent a 32bit bitmask. The total size of the datatype can still be aligned to bytes
I am not too sure if we want to specify Bool(bit=1, lanes=1)
, since that can be machine dependent. On a 8bit-byte machine I guess we might still want to pad to minimum byte. But say if a machine have bit-level addressing then it would be onebit.
@tqchen to be honest, I have always had trouble understanding how lanes are to be used. If I have say a 1111 elements in my array and bits does it work to say that size=1111
but dtype=Bool(bit=1, lanes=8)
(or something larger, since we mainly want to signal that its byte stored clearly)?
The lane represents the lanes of the unit-data type.
Say we want to store a bit mas, which is represented by int32. To store 65 bits, we will need 3 integers, in this case, it is
array(dtype=Bool(bits=1, lanes=32), size=3)
in the low level, so we have 3 * 32 bits in total.
If we store bool as a normal byte, and to store 65 bools, we need
array(dtype=Bool(bits=8, lanes=1), size=65)
@tqchen but that is a problem, because how do I pass the 65 bits information there since 3 * 32 > 65
and the 65 is vital information!?
@seberg I get what you mean. I feel that could be something being addressed by enhancing the array information to include sub-byte boundary information. I am mainly describing what is being interpreted from the spec right now in a way that is also mostly consistent with compilers like LLVM
I guess that the use-case I was asking for could use/abuse "lanes", because there is a side-channel to pass the actual shape. But it doesn't feel ideal to me, so I am wondering if we can think of a pragmatic way to make this possible. (To be honest, I have never seen "lanes" used, or dtypes relying on being padded to byte storage.)
Maybe it would make sense to either introduce a new dtype or some sort of flag for bitmasks?
Close #75. Close #76. Supersedes #76.
It turns out we all forgot @alonre24 has already pushed a PR (#76), so all I did was minor edits in the docstrings and sync with the latest master, with his commit preserved.
cc: @tqchen