Up until this PR, the top function input/output argument type has been set to 64-bit integer type (for integer type args), and type casting is done inside the function body. This was due to the fact that numpy has only 8, 16, 32, 64-bit integer types.
This PR extends hcl.Array and LLVM runtime to support arbitrary bitwidth input arguments from numpy array.
Methods
Byte-as-field numpy array
To store arbitrary width integer data as numpy arrays, we use struct-type numpy arrays, with each byte as a field. Therefore, each integer scalar is represented as a struct of bytes, and the bytes are contiguous in the memory.
Arbitrary data representation
When input data is wider than 64-bit, it cannot be represented as a numpy scalar type. Instead, we use multidimensional lists of integers in Python to represent input tensors, because Python integers can have arbitrary bitwidth.
MLIR arbitrary bitwidth integer alignment
When passing data from numpy to an MLIR's ExecutionEngine as input arguments, we are creating C Struct from numpy ndarrays with the ctypes module in Python. Through a series of experiments, I found that the required alignment of such C Struct is not byte-level, instead, it depends on the integer bitwidth:
Integer type bitwidth (bit)
alignment(bit)
(0, 8]
8
(8, 16]
16
(16, 32]
32
(32, 64]
64
(64, 128]
128
(128, 256]
256
(256, 512]
512
Changes
make_anybitwidth_numpy_array is moved from ir_builder.py to utils.py
All field formats in the struct numpy array are set to unsigned, this makes sign extension in runtime easier to implement, and this change does not affect the creation of DenseAttr in constant tensor op's IRBuilder function.
hcl.Array.np_array is refactored and extended to support any bitwidth data
Limitations
This PR only upgrades Int and UInt types. Fixed/UFixed types are not covered, because fixed-to-integer pass needs to be updated in the IR first. Support for fixed-point type will be added by another PR.
Summary
Up until this PR, the top function input/output argument type has been set to 64-bit integer type (for integer type args), and type casting is done inside the function body. This was due to the fact that numpy has only 8, 16, 32, 64-bit integer types.
This PR extends
hcl.Array
and LLVM runtime to support arbitrary bitwidth input arguments from numpy array.Methods
Byte-as-field numpy array
To store arbitrary width integer data as numpy arrays, we use struct-type numpy arrays, with each byte as a field. Therefore, each integer scalar is represented as a struct of bytes, and the bytes are contiguous in the memory.
Arbitrary data representation
When input data is wider than 64-bit, it cannot be represented as a numpy scalar type. Instead, we use multidimensional lists of integers in Python to represent input tensors, because Python integers can have arbitrary bitwidth.
MLIR arbitrary bitwidth integer alignment
ctypes
module in Python. Through a series of experiments, I found that the required alignment of such C Struct is not byte-level, instead, it depends on the integer bitwidth:Changes
make_anybitwidth_numpy_array
is moved fromir_builder.py
toutils.py
hcl.Array.np_array
is refactored and extended to support any bitwidth dataLimitations
fixed-to-integer
pass needs to be updated in the IR first. Support for fixed-point type will be added by another PR.