added enable_optimization and compare_optimization flags to use optimized code and to compare the results of the optimized code with the original code.
added embedding_concat_optimized that uses PyTorch and performs checkerboard pattern downscaling and concatenation.
added embedding_concat_numpy that uses NumPy and performs checkerboard pattern downscaling and concatenation.
added postprocess_optimized
added infer_optimized that works through vectorization
added infer_init_run to run padim with dummy data to ensure hot inference later
On RTX 4060 Laptop GPU, Optimized code infers default ./bottle_000.png in average 10.25 ms while original code takes 203.0 ms
added enable_optimization and compare_optimization flags to use optimized code and to compare the results of the optimized code with the original code.
added embedding_concat_optimized that uses PyTorch and performs checkerboard pattern downscaling and concatenation.
added embedding_concat_numpy that uses NumPy and performs checkerboard pattern downscaling and concatenation.
added postprocess_optimized
added infer_optimized that works through vectorization
added infer_init_run to run padim with dummy data to ensure hot inference later
On RTX 4060 Laptop GPU, Optimized code infers default ./bottle_000.png in average 10.25 ms while original code takes 203.0 ms