[x] Fixed a few bugs in OpenCL host code generator (e.g. assign different AXI bundles for each memory port, use correct OpenCL runtime API flags, remove unused dead code, allocate large memory on heap instead of stack in host, etc.)
[x] Add more test cases for HCL-AutoSA (large size GEMM, CONV and LU)