Remove blocksize 64 for quant/dequant functions

This PR removes 64 blocksize for quantize and dequantize functions, as ROCm warpsize doesn't support that case.

It also skips that case for tests which use quantize/dequantize functions. These are the tests enabled with this PR:

test_autograd.py::test_matmul_fp8 test_functional.py::test_dynamic_blockwise_quantization test_functional.py::test_4bit_compressed_stats