Wikidata / soweego

Link Wikidata items to large catalogs
GNU General Public License v3.0
95 stars 8 forks source link

Consider using URL tokens as a feature #312

Closed tupini07 closed 5 years ago

tupini07 commented 5 years ago

Plain URL tokens might not be good as a feature since they have a lot of noise. Consider using the list of url stopwords found in soweego.commons.resources.urls_stop_words.txt so that only relevant tokens are considered for comparison.