Allow the maximum requested response size (tokens) to be specified in the command

I think with implementing #41 we should be able to dynamically adjust the request for a model source such that it never requests for more tokens than the maximum allowable by the model.

I think the design decision I want to go with is that from a UX perspective, the user shouldn't need to worry about the token length unless they are going over the limit.

I would personally rather give a model the freedom to respond in as many tokens as it can instead of artificially limiting its response. The downside to this is a monetary cost for API sources or computational cost for local sources (Comming Soon TM).

Side note, in the future token limitation might not be a thing to worry about due to Sliding Attention, but that's a different thing to contend with.

With all that said, this will need to be implemented to adjust the max token request dynamically anyway.

dense-analysis / neural

Allow the maximum requested response size (tokens) to be specified in the command #18