Closed conan1024hao closed 4 months ago
@liwii A small question, I found that ""
and ''
are mixed up in the code, is there any policies of it?
For example, ''
is used in https://github.com/citadel-ai/langcheck/blob/c55681b08c33d6e6779a3b07f47ce86a0cc549cb/src/langcheck/metrics/eval_clients/_anthropic.py#L150, but ""
is used in https://github.com/citadel-ai/langcheck/blob/c55681b08c33d6e6779a3b07f47ce86a0cc549cb/src/langcheck/metrics/eval_clients/_anthropic.py#L110.
Ah yeah it is just inconsistent haha
Our linter & formatter do not handle them properly now, but we'll fix that altogether in #125. You don't need to care too much about that in this PR!
Note: if we want to ensure the outputs are parsable by sampling multiple times, maybe we need to adjust the temperature settings.
Since the functions of generating text response (get_text_response()
) and getting score (get_float_score
) are separated, it might be necessary to redefine get_score()
function like this:
def get_score(
self,
metric_name: str,
language: str,
prompts: str | Iterable[str],
score_map: dict[str, float],
*,
intermediate_tqdm_description: str | None = None,
score_tqdm_description: str | None = None
) -> tuple[list[float | None], list[str | None]]:
if isinstance(prompts, str):
prompts = [prompts]
unstructured_assessment_results = []
scores = []
for prompt in prompts:
attempts = 0
score = None
# Retry if the score is None until the max_attempts.
while attempts < self._max_attempts and score is None:
unstructured_assessment_result = self.get_text_responses(
[prompt], tqdm_description=intermediate_tqdm_description)
score = self.get_float_score(
metric_name,
language,
unstructured_assessment_result,
score_map,
tqdm_description=score_tqdm_description)[0]
attempts += 1
unstructured_assessment_results.append(
unstructured_assessment_result[0])
scores.append(score)
return scores, unstructured_assessment_result
@liwii Hi, I merged https://github.com/citadel-ai/langcheck/pull/126 to this one and
/metrics/prometheus
load_prompt_template
to handle different promptsPlease take a review!
Adds the Prometheus Eval Client
similarity_scorer
function is not implemented.torch.bfloat16
requirescuda
environment with 24GB or more VRAM.