Improve log messages around the max sequence length

Motivation

The existing messages were confusing to the users.

Modifications

In the router the error message was rephrased to make it more understandable for users who arent familiar with the internals.

In the server we now print the maximum possible sequence length limited by the model sequence length. The existing print was showing how much output tokens can fit into the memory if you pass max_sequence_length input tokens and vice-versa. I don't know what I was thinking when I wrote that.

IBM / text-generation-inference

Improve log messages around the max sequence length #103

Motivation

Modifications

Related Issues