Enabled o3 and other optimization in the benchmark.
In the kernel benchmark, the creation of an ndarray was intentionally incorporated to serve as a minor overhead, aligning the kernel benchmark with the code used for the numpy benchmark.
Removed the use of the unit test framework in the benchmark code.