Describe the bug
By default the accept type in inference container seems to be application/json. The default encoder which converts results to JSON seems to add significantly to the response latencies. Is there a way to reduce the default response's latencies?
Describe the bug By default the accept type in inference container seems to be application/json. The default encoder which converts results to JSON seems to add significantly to the response latencies. Is there a way to reduce the default response's latencies?