Closed borzunov closed 10 months ago
Petals supports only use_cache=True for inference.
use_cache=True
However, we should not reject use_cache=False since it returns identical results (just forces the slower O(n^3) inference algorithm instead of the O(n^2) one).
use_cache=False
I allow use_cache=False since some models use this setting for reasons unclear to me (see https://huggingface.co/garage-bAInd/Platypus2-70B-instruct/discussions/8), and this led to AssertionError before this PR.
AssertionError
Petals supports only
use_cache=True
for inference.However, we should not reject
use_cache=False
since it returns identical results (just forces the slower O(n^3) inference algorithm instead of the O(n^2) one).I allow
use_cache=False
since some models use this setting for reasons unclear to me (see https://huggingface.co/garage-bAInd/Platypus2-70B-instruct/discussions/8), and this led toAssertionError
before this PR.