geolexica / isotc211.geolexica.org

ISO/TC 211 online version of the Multi-Lingual Glossary of Terms
https://isotc211.geolexica.org
4 stars 2 forks source link

Search not finding some terms #147

Closed skalee closed 4 years ago

skalee commented 4 years ago

On TC211 site, searching for "Thiessen polygon" (fra: "polygone de Thiessen", pol: "wielobok Thiessena") concept gives unpredictable results.

How to reproduce

  1. Go to https://isotc211.geolexica.org/
  2. In a "find a concept" field, try typing following queries:
    • thiessen -> there are results
    • thies -> there are results
    • Thiessen -> no results (extracted to geolexica/geolexica-server#124)
    • polygon -> there are results
    • polygone (i.e. in French) -> no results
    • wielobok (i.e. in Polish) -> no results

What is expected

All above should return results. The results should include (not necessarily exclusively) links to concept 460. @ronaldtse please confirm if I'm correct.

Underlying data

For the reference, concept 460 is defined as follows (some languages are skipped):

YAML ```yaml --- term: Thiessen polygon termid: 460 eng: id: 460 definition: polygon that encloses one of a set of points on a plane so as to include all direct positions that are closer to that point than to any other point in the set language_code: eng notes: [] examples: [] entry_status: valid review_indicator: '' authoritative_source: ref: ISO 19123:2005 link: https://www.iso.org/standard/40121.html date_accepted: 2005-08-15 00:00:00.000000000 +08:00 release: '1' terms: - type: expression designation: Thiessen polygon normative_status: preferred fra: id: 460 definition: polygone comportant un ensemble de points (parmi tous les ensembles de points) sur un plan de façon à inclure toutes les positions directes qui sont plus proches de ce point que de tout autre point de l'ensemble language_code: fra notes: [] examples: [] entry_status: valid authoritative_source: ref: ISO 19123 link: https://www.iso.org/standard/40121.html lineage_source: '' date_accepted: 2005-08-15 00:00:00.000000000 +08:00 terms: - type: expression designation: polygone de Thiessen normative_status: preferred pol: id: 460 definition: wielobok, który otacza jeden z punktów zbioru na płaszczyźnie w taki sposób, że obejmuje wszystkie położenia bezpośrednie, które są bliższe temu punktowi niż dowolnemu innemu punktowi z tego zbioru language_code: pol notes: [] examples: [] authoritative_source: ref: ISO 19123:2005 link: https://www.iso.org/standard/40121.html lineage_source: PN-EN ISO 19123 date_accepted: 2005-08-15 00:00:00.000000000 +08:00 release: '1' terms: - type: expression designation: wielobok Thiessena normative_status: preferred ```

JSON which contains a search index is a valid JSON array and contains following data (some languages and most terms are skipped):

JSON ```json [ { "termid": 460, "term": "Thiessen polygon", "eng": { "term": "Thiessen polygon", "id": 460, "entry_status": "valid", "language_code": "eng", "review_decision": null }, "fra": { "term": "polygone de Thiessen", "id": 460, "entry_status": "valid", "language_code": "fra", "review_decision": null }, "pol": { "term": "wielobok Thiessena", "id": 460, "entry_status": "valid", "language_code": "pol", "review_decision": null } } ] ```
skalee commented 4 years ago

This issue describes a problem reported on ISO TC 211 site, hence I'm transferring it. Other sites are not affected, except for upper/lower case thing, which has been extracted to https://github.com/geolexica/geolexica-server/issues/124.