JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 289 forks source link

Documentation for computing F-scores #34

Open lpietrobon opened 6 years ago

lpietrobon commented 6 years ago

Hi there,

I think there might be a mistake in the documentation. The Understanding Scaled F-Score section says

The F-Score of these two values is defined as:

$$ \mathcal{F}_\beta(\mbox{prec}, \mbox{freq}) = (1 + \beta^2) \frac{\mbox{prec} \cdot \mbox{freq}}{\beta^2 \cdot \mbox{prec} + \mbox{freq}}. $$

$\beta \in \mathcal{R}^+$ is a scaling factor where frequency is favored if $\beta < 1$, precision if $\beta > 1$

I believe it should say

$\beta \in \mathcal{R}^+$ is a scaling factor where frequency is favored if $\beta > 1$, precision if $\beta < 1$

For beta >> 1

$$ (1 + \beta^2) \frac{\mbox{prec} \cdot \mbox{freq}}{\beta^2 \cdot \mbox{prec} + \mbox{freq}} \approx (\beta^2) \frac{\mbox{prec} \cdot \mbox{freq}}{\beta^2 \cdot \mbox{prec}} = \mbox{freq} $$

and for beta --> 0

$$ (1 + \beta^2) \frac{\mbox{prec} \cdot \mbox{freq}}{\beta^2 \cdot \mbox{prec} + \mbox{freq}} \approx (1) \frac{\mbox{prec} \cdot \mbox{freq}}{0 + \mbox{freq}} = \mbox{prec} $$