After reviewing and evaluating some sample outputs , it seems that a lot of health-related claims got no response at all from Gemini. This is likely to reduce the recall (and potentially the usefulness) of the tool.
The first part of the prompt is a filter that includes bits like:
Ignore sentences that are not directly about health.
You should only consider claims that are on topics like health, medicine ...
Any claim that is acceptable to that part of the prompt then gets 5 labels added and we generate a score. As an experiment, it would be good to see what happens if we allow more claims to be returned with the 5 labels; if the rest of the model is working well, non-health claims should get a low score and so be filtered out later. Of course, it's possible that we might improve recall but trash precision with lots of claims that aren't really about health.
Requirements
generate some output for a video
edit the prompt so it is much less health-specific, and just selects/paraphrases any claim made
Overview
After reviewing and evaluating some sample outputs , it seems that a lot of health-related claims got no response at all from Gemini. This is likely to reduce the recall (and potentially the usefulness) of the tool.
The first part of the prompt is a filter that includes bits like:
Requirements