cessda / cessda.cvs.two

Apache License 2.0
0 stars 2 forks source link

Alphabetical ordering in results list incorrect #424

Closed cessda-bitbucket-importer closed 1 year ago

cessda-bitbucket-importer commented 1 year ago

Original report on BitBucket by Taina Jääskeläinen.


It seems that the alphabetical ordering in the results list is now incorrect in most languages.

For instance, Slovenian is otherwise in alphabetical order but Časovna metoda (TimeMethod) comes las in the production version. It should not.

The list in English in the Editor in staging contains vocabularies

This is not a design issue so perhaps wait till the versioning and workflow issues for different roles have been resolved.

cessda-bitbucket-importer commented 1 year ago

Original comment by Stefan Dlugolinsky (GitHub: Stifo).


it looks like the sort can't deal with the used character encoding

cessda-bitbucket-importer commented 1 year ago

Original comment by Taina Jääskeläinen.


Using UTF-8 might solve this, what do you think?

cessda-bitbucket-importer commented 1 year ago

Original comment by Stefan Dlugolinsky (GitHub: Stifo).


my temporal notes:

# get field mapping - https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-get-field-mapping.html
GET vocabularypublish/_mapping/field/titleSl

# put field mapping (in general, the mapping for existing fields cannot be updated) - https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-put-mapping.html
PUT vocabularypublish
{
  "mappings": {
    "properties": {
      "name": {   
        "type": "text",
        "fields": {
          "sort_titleSl_Key": {  
            "type": "icu_collation_keyword",
            "index": false,
            "language": "sl",
            "country": "SL",
            "variant": "@‌collation=phonebook"
          }
        }
      }
    }
  }
}

  1. before deploying the new version, which fixes this issue, either:

    1. delete indexes in running elasticsearch service

    2. stop cvs-elasticsearch service, remove it, delete its docker volume cvs-es and start the service again using the updated docker compose yaml (app.yml + elasticsearch.yml)

  2. reindex indexes in CVS → Admin → Maintenance →

    • Indexing Vocabulary Publish → Index Published CVs
    • Indexing Vocabulary Editor → Indexing

cessda-bitbucket-importer commented 1 year ago

Original comment by Taina Jääskeläinen.


I forgot to tell you that if the user has chosen (or left) the language as English, the vocabularies that are only in Finnish or German (e.g. in the Editor) should not be in the English list at all. They should only appear if the user has chosen ‘All languages’ which is at the bottom of the list.

If ‘All languages’ is not chosen, the alphabetical ordering is by the vocabulary long name in the specific language and follows the alphabetical order of that language. But for ‘all languages’ your notes are more than relevant.

Specifying the problem in the staging (both in Home and Editor list)

So there are two issues: vocabularies in the wrong language list and vocabularies not in alphabetical order.

Please let me know if you need further clarification.

We made the decision to put ‘All languages’ at the bottom of the list because we noticed that the all languages search did not in many cases find the terms or definitions. It found them only when the search terms used were in the exactly same format as in the text (so if search term in singular, did not find plural or other forms). Language analyzers do not seem to work well across languages.

cessda-bitbucket-importer commented 1 year ago

Original comment by Stefan Dlugolinsky (GitHub: Stifo).


Update:

cessda-bitbucket-importer commented 1 year ago

Original comment by Taina Jääskeläinen.


I think I have found one reason for alphabetical ordering not being right in the English list:

Currently, the Editor in staging has these vocabularies on page 2 in the English list. None of them have an English version.

Would there be any defaults that would lead to them being included in the English list?

I think all the problems with vocabularies appearing in a language list when there is no version in that language are in the English list.

Therefore, these two English list problem would need solving:

  1. Ensure that in the English list, even if the SL is something else, the EN version is always displayed (if available).
  2. If there is no English version available, the vocabulary should not be in the English list at all.

Then we would see what potential sorting issue remain after this.

In ‘All languages’ list I see three options:

  1. Sorting is done by SL title. This requires that some kind of sorting across languages can be achieved as SLs are not always English.
  2. We use the short name for sorting as it is the same for all languages, instead of the language-specific long name. The long name displayed is the SL one. This will look a bit funny to users as the long name comes first in view but maybe they would eventually realize the sorting is by short name.
  3. We drop the ‘All languages’ selection for now from view and I’ll make an issue for some later consideration.

cessda-bitbucket-importer commented 1 year ago

Original comment by Stefan Dlugolinsky (GitHub: Stifo).


This fix requires to reindex the indexes:

CVS → Admin → Maintenance →

cessda-bitbucket-importer commented 1 year ago

Original comment by Stefan Dlugolinsky (GitHub: Stifo).


will be ready to test in dev and staging after the new version is deployed

cessda-bitbucket-importer commented 1 year ago

Original comment by Stefan Dlugolinsky (GitHub: Stifo).


there are some problems after the deployment; i.e., "Internal server error" on "All languages”

@Joshocan Q: When a deployment (dev+staging) is made and there was an update of the elasticsearch's container command in elasticsearch.yml, is the elasticsearch (ES) also redeployed? If not, we need to redeploy all the ES shards in order to make them install collation plugin for correct alphabetical sorting, which is used by this fix. An update: It looks like there’s a different elasticsearch.yml used in master deployment than it is in the src/main/docker, because there’s an installation of repository-gcs at the docker image build, while this thing is missing in the elasticsearch.yml of master branch.

cessda-bitbucket-importer commented 1 year ago

Original comment by Joshua Tetteh Ocansey (GitHub: Joshocan).


@Stifo Issue #447 ( Analysis-icu plugin for ES installed). Please test on dev or staging and confirm if the issue is resolved.

cessda-bitbucket-importer commented 1 year ago

Original comment by Stefan Dlugolinsky (GitHub: Stifo).


Made a comment in #447, additional step is required before testing.

cessda-bitbucket-importer commented 1 year ago

Original comment by Stefan Dlugolinsky (GitHub: Stifo).


@‌TainaFSD you can now test this issue in both dev and staging

cessda-bitbucket-importer commented 1 year ago

Original comment by Stefan Dlugolinsky (GitHub: Stifo).


icu analysis plugin was installed in elasticsearch for both dev and staging in #447

cessda-bitbucket-importer commented 1 year ago

Original comment by Taina Jääskeläinen.


Tested in staging both in Chrome and Firefox. The results are good: