commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
122 stars 24 forks source link

Index the public suffix part of domains #31

Closed sylvinus closed 8 years ago

sylvinus commented 8 years ago

We seem to not be indexing the public suffix part of domains. Intention of that may have been to avoid indexing "com" all the time but this is too restrictive.

https://github.com/commonsearch/cosr-back/blob/master/cosrlib/document/__init__.py#L96

As a result, nord.gouv.fr is not found in https://uidemo.commonsearch.org/?g=fr&q=nord+gouv+fr