In fine_tuning.py / construct_in_context_examples(), we generate labels of "worth checking", "maybe worth checking" or "not worth checking". These are derived from the multi-label results from the model, based on fact-checker provided data. Currently, the labelling is very crude - e.g. if the "harm" is "high" then add "worth checking".
Requirements
Think about this more carefully and come up with a better scoring system. Consider what labels we have available and how they should be interpreted, and how they might interact. One option would be to introduce a points system, where a claim gains "checkworthiness" points if it is predicted as high harm or suggesting actions, but loses points if it is personal experience or vague. Then convert the score back to a label.
Before implementing a new scoring system, propose it and open up for discussion.
Overview
In
fine_tuning.py
/construct_in_context_examples()
, we generate labels of "worth checking", "maybe worth checking" or "not worth checking". These are derived from the multi-label results from the model, based on fact-checker provided data. Currently, the labelling is very crude - e.g. if the "harm" is "high" then add "worth checking".Requirements
Think about this more carefully and come up with a better scoring system. Consider what labels we have available and how they should be interpreted, and how they might interact. One option would be to introduce a points system, where a claim gains "checkworthiness" points if it is predicted as high harm or suggesting actions, but loses points if it is personal experience or vague. Then convert the score back to a label.
Before implementing a new scoring system, propose it and open up for discussion.