clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Ranking after facet selection #232

Open twagoo opened 5 years ago

twagoo commented 5 years ago

If searching, for instance, for records with a specific language, the default ranking is applied. For some facets it would perhaps make sense to rank according to this selection if there is no query. Perhaps in those cases we can apply reranking based on a query that is just the facet value.

Do this only for certain facets? It makes sense for e.g. language and genre (values that are likely to appear in title or description), but less for e.g. resource type and availability..

twagoo commented 5 years ago

Example selecting Afrikaans (languageCode:"code:afr"):

vlo | solr

[
      {
        "name":["EXMARaLDA Demo corpus"]},
      {
        "name":["Afrikaans Web corpus (South Africa) from 2018 (afr-za_web_2018_1M)"]},
      {
        "name":["Afrikaans Wikipedia corpus from 2018 (afr_wikipedia_2018_300K)"]},
      {
        "name":["eSpeak"]},
      {
        "name":["Concreteness and imageability lexicon MEGA.HR-Crossling"]},
      {
        "name":["NCHLT Text Web Services"]},
      {
        "name":["NCHLT-inlang Pronunciation Dictionaries"]},
      {
        "name":["NCHLT Part of Speech Taggers"]},
      {
        "name":["Lwazi Afrikaans ASR corpus"]},
      {
        "name":["South African Directory Enquiries (SADE) Name Corpus"]}]

vs selecting Afrikaans (languageCode:"code:afr") + querying for 'Afrikaans':

vlo | solr

[
      {
        "name":["African Speech Technology Afrikaans-Afrikaans Speech Corpus"]},
      {
        "name":["Afrikaans Web corpus (South Africa) from 2018 (afr-za_web_2018_1M)"]},
      {
        "name":["Afrikaans Wikipedia corpus from 2018 (afr_wikipedia_2018_300K)"]},
      {
        "name":["African Speech Technology Coloured-Afrikaans Speech Corpus"]},
      {
        "name":["Afribooms Afrikaans Dependency Treebank"]},
      {
        "name":["African Speech Technology Black-Afrikaans Speech Corpus"]},
      {
        "name":["Autshumato Afrikaans-English Translation Memory"]},
      {
        "name":["Autshumato English-Afrikaans Translation Memory"]},
      {
        "name":["Afrikaans Genre Classification Corpus"]},
      {
        "name":["TWb040612-01",
          "Elicitation !Xoon lexicon (Animals: birds, reptiles, invertebrates)"]}]
twagoo commented 2 years ago

A few open questions:

twagoo commented 2 years ago

It may be possible to do this with a re-ranking query that is constructed in the back-end based on some rules in reference to the facets. For instance: language facet selection -> re-rank on occurrence of any of the selected languages (by their English name) in the title or (with lower priority) description.