Open thobson88 opened 3 weeks ago
The first two steps of the pipeline are unchanged:
The third step, disambiguation via the Linker, is new. We assume the place of publication wqid and latlon are known.
wqid
latlon
For each identified toponym:
cross_cand_score
combined_score
combined score
See also: https://github.com/Living-with-machines/data-culture-newspapers/issues/17
Linker algorithm using place of publication & combined scores
The first two steps of the pipeline are unchanged:
The third step, disambiguation via the Linker, is new. We assume the place of publication
wqid
andlatlon
are known.For each identified toponym:
wqid
is found in the list of candidates and is not the prediction:cross_cand_score
latlon
coordinates for the candidate are not known/available:combined_score
equal to thecross_cand_score
combined_score
equal to thecross_cand_score
latlon
coords and popularity are computable:combined_score
=cross_cand_score
* max(popularity, proximity)combined score
and set the prediction to be the top oneSee also: https://github.com/Living-with-machines/data-culture-newspapers/issues/17