[Proposal] Introduce multi-dimensional load/store to HeteroCL IR

For the existing HeteroCL IR, we only support flattened tensor expressions, which may cause some problems during code generation and may make some analyses difficult. To solve that, we propose to introduce two new IR nodes for multi-dimensional load/store. For the original Halide IR, it uses Provide and Call for multi-dimensional tensor accesses while TVM IR removes them. However, the names are not straightforward. Thus, we propose the name to be NDLoad and NDStore, following the same name convention in NumPy, where ND stands for N-dimensional.

With this, we also need to modify some of our passes. Following are some changes.

The current HeteroCL python fronted automatically flattens the tensor accesses. We need to maintain the ND information instead.
The current StorageFlattening pass is simply doing buffer binding. We need to implement the real flattening.
For the order of the passes, everything before StorageFlattening should be kept as ND. We should also add a configuration for whether we skip the step. In general, StorageFlattening should only be used for CPU execution.

This change should fix the following problem(s):

Incorrect/Weird index generation in HLS code (#276).

cornell-zhang / heterocl

[Proposal] Introduce multi-dimensional load/store to HeteroCL IR #327