In lib.rs, cut_to_choices() is "best effort" in cutting a tree down to have <= choices number of tokens. Empirically, when choices=64, it fails to cut the tree down about 1 in a million times. I made this change so that the error that occurs when this happens is much easier to track down. I did not make any upstream changes to catching this error in gen_model_answer_rest.py because I think it's best to crash when this happens in that specific use case. However, in other situations (e.g. querying REST 100 million times to gather statistics about it), it is better to catch and ignore this error. The change in this PR simply allows developers to choose how they want to catch this error when using the DraftReceiver wheel instead of having it always crash when cut_to_choices() fails.
In
lib.rs
,cut_to_choices()
is "best effort" in cutting a tree down to have <=choices
number of tokens. Empirically, whenchoices=64
, it fails to cut the tree down about 1 in a million times. I made this change so that the error that occurs when this happens is much easier to track down. I did not make any upstream changes to catching this error ingen_model_answer_rest.py
because I think it's best to crash when this happens in that specific use case. However, in other situations (e.g. querying REST 100 million times to gather statistics about it), it is better to catch and ignore this error. The change in this PR simply allows developers to choose how they want to catch this error when using theDraftReceiver
wheel instead of having it always crash whencut_to_choices()
fails.