dbpedia / mappings-tracker

This project is used for tracking mapping issues in mappings.dbpedia.org
9 stars 6 forks source link

Should superscripts be ignored? #80

Closed svick closed 8 years ago

svick commented 8 years ago

I was looking at dbr:Czech_Republic and noticed that its dbp:cctld (.czc, should be .cz) and dbp:callingCode (+420b, should be +420) are wrong. The reason is that the Wikipedia code contains superscripts to indicate footnotes:

| calling_code = [[Telephone numbers in the Czech Republic|+420]]<sup>b</sup>
| cctld = [[.cz]]<sup>c</sup>

In both cases, it's the <sup> that results in wrong data. Would it make sense to completely ignore superscripts? Or maybe have some heuristic specifically for ignoring footnote references?

svick commented 8 years ago

Sorry, meant to post this to dbpedia/extraction-framework, I think it's more appropriate there.