responses truncating with `finish_reason`: 'length' despite not being at the token limit

we seem to have a consistent limit of around 1k tokens round trip, as seen in these example completions:

Nous-Hermes 13b:

{
  "id": "cmpl-75e9ce09-6b85-4add-9adc-50df0c027f33",
  "object": "text_completion",
  "created": 1688947842,
  "model": "nous-hermes-13b",
  "choices": [
    {
      "text": "Special tax filing rules when deemed to have been made can be found in",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 907,
    "completion_tokens": 16,
    "total_tokens": 923
  }
}

openai's text-davinci-003:

{
  "id": "cmpl-7aYnzDuDAcwNAJpbqLBTQVQ6cByro",
  "object": "text_completion",
  "created": 1688947955,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": " The applicant company is The Marigold Trust dated 7/16/15,",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 684,
    "completion_tokens": 16,
    "total_tokens": 700
  }
}

{
  "id": "cmpl-7aYoAgKvZYHOBXmBR89omUqv0r4hb",
  "object": "text_completion",
  "created": 1688947966,
  "model": "text-davinci-003",
  "choices": [
    {
      "text": " 91,881 (2.10 ac) net square feet after required dedication",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 776,
    "completion_tokens": 16,
    "total_tokens": 792
  }
}

something is fishy here. Nous-Hermes and text-davinci-003 both have a context limit of 4096 tokens.

OoriData / OgbujiPT

responses truncating with `finish_reason`: 'length' despite not being at the token limit #14