Closed taniokay closed 3 weeks ago
FYI: Even though we specify the fixed seed, the API returns more different outputs than I thought. The outputs of 2 exact same rephrase()
calls look like:
In [12]: rephrase("List three representative testing methods for LLMs.", num_perturbations=5, eval_client=client)
Intermediate assessments (1/2): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.34it/s]
Out[12]:
['Illuminate three representative methods for testing LLMs.',
'Identify three typical techniques for testing LLMs.',
'************\n[Prompt]: Provide a list of three typical testing approaches for LLMs.\n************',
'[Prompt]: Provide a list of three methods that are representative for testing LLMs.',
'Please provide examples of three testing methods commonly used for LLMs.']
In [13]: rephrase("List three representative testing methods for LLMs.", num_perturbations=5, eval_client=client)
Intermediate assessments (1/2): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00, 1.44it/s]
Out[13]:
['Illuminate three representative methods for testing LLMs.',
'Identify three typical examination approaches for LLMs.',
'[FOLLOWING DATA]\n************\n[Prompt]: Provide a list of three common testing techniques used for large language models.\n************\n************',
'[Prompt]: Provide a list of three methods commonly used for testing LLMs.',
'[Prompt]: Provide three typical testing techniques used for LLMs.']
Ref: https://platform.openai.com/docs/api-reference/chat/create
seed
integer
ornull
Optional
This feature is in Beta. If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.
This is ready for review again!
transformers == 4.46.0
is now broken for Python 3.8, so let me pin it as transformers<4.46.0
!
Resolves #157
Motivation
Refactoring in #110 introduced
EvalClient
to make the interface consistent with different external APIs. This PR also alignslangcheck.augment.rephrase
to the current interface.