Yoast / YoastSEO.js

Analyze content on a page and give SEO feedback as well as render a snippet preview.
GNU General Public License v3.0
403 stars 170 forks source link

Use Gini coeffecient to measure uniformness of keyword distribution. #1819

Closed hansjovis closed 6 years ago

hansjovis commented 6 years ago

Summary

We advise users to distribute their keywords evenly throughout the text. The Gini coefficient can be used to measure the uniformness of a distribution. In our case, it can measure the uniformness of the distances between keyword instances. This will be a more accurate keyword distribution measure.

Explanation

We are measuring the uniformness by checking if the distance between any two keywords does not exceed a percentage of the total nr. of characters. This is currently set to 40%. This, however, has the disadvantage that it does not capture all instances where the distances between keywords are not evenly distributed. The Gini coefficient however has been developed to measure inequality in a distribution (specifically income inequality) so it would better reflect the notion of a uniform keyword distribution.

E.g.:

1. Uniform distribution. No inequality. Gini coef. of 0.

screen shot 2018-09-25 at 12 12 14

2. One outlier. Triggers "okay" score on assessment. Gini coef. of 0.228.

screen shot 2018-09-25 at 12 12 21

3. Keywords non-uniformly distributed. Triggers "good" score. Gini coef. of 0.28.

screen shot 2018-09-25 at 12 12 27
hansjovis commented 6 years ago

Note: @nataliashitova had a valid critique in that this should play nice when you have multiple keywords in your key phrase. In that case the "distance between keywords" gets ill-defined. If we implement this, we have to find a solution.

manuelaugustin commented 6 years ago

Closed as we actually adopted the Gini coefficient as a metric for keyword distribution (see https://github.com/Yoast/YoastSEO.js/pull/1789).