Question about F1/Recall calculation

alibaba / GraphTranslator

GraphTranslator:Aligning Graph Model to Large Language Model for Open-ended Tasks

BSD 3-Clause "New" or "Revised" License

68 stars 12 forks source link

Question about F1/Recall calculation #2

Closed W-rudder closed 5 months ago

W-rudder commented 5 months ago

Great job! How would F1 and recall metrics be calculated if there are invalid responses? Will the samples with invalid predictions be filtered out?

W-rudder commented 5 months ago

For example, in the Taobao (Lifestage) dataset, if the Legality Rate of LLM+s_v is 50.1%, does the F1 calculation for this model only consider the 50.1% of samples with legal responses?

fs302 commented 5 months ago

@W-rudder Thanks for your question! F1-score is only calculated for the legal responses.