aws-samples / amazon-bedrock-workshop

This is a workshop designed for Amazon Bedrock a foundational model service.
https://catalog.us-east-1.prod.workshops.aws/workshops/a4bdb007-5600-4368-81c5-ff5b4154f518/en-US/20-intro
MIT No Attribution
1.27k stars 545 forks source link

F1 score in 01_fine-tuning-titan-lite.ipynb #242

Open jicowan opened 2 months ago

jicowan commented 2 months ago

I had to run the following code block 2x before it would output the scores. The first time I ran it, the output was blank:

from bert_score import score
reference_summary = [reference_summary]
fine_tuned_model_P, fine_tuned_R, fine_tuned_F1 = score(fine_tuned_generated_response, reference_summary, lang="en")
base_model_P, base_model_R, base_model_F1 = score(base_model_generated_response, reference_summary, lang="en")
print("F1 score: base model ", base_model_F1)
print("F1 score: fine-tuned model", fine_tuned_F1)

Final output:

F1 score: base model  tensor([0.8868])
F1 score: fine-tuned model tensor([0.8532])
jicowan commented 2 months ago

You might want to consider using the Model Evaluation feature within Bedrock to compare the models rather than using the score function.

w601sxs commented 2 months ago

Thanks @jicowan - we will get back to this; prioritizing other bugs for now.