Closed cessda-bitbucket-importer closed 1 year ago
Original comment by Stefan Dlugolinsky (GitHub: Stifo).
it looks like the sort can't deal with the used character encoding
Original comment by Taina Jääskeläinen.
Using UTF-8 might solve this, what do you think?
Original comment by Stefan Dlugolinsky (GitHub: Stifo).
my temporal notes:
# get field mapping - https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-get-field-mapping.html
GET vocabularypublish/_mapping/field/titleSl
# put field mapping (in general, the mapping for existing fields cannot be updated) - https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-put-mapping.html
PUT vocabularypublish
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"sort_titleSl_Key": {
"type": "icu_collation_keyword",
"index": false,
"language": "sl",
"country": "SL",
"variant": "@‌collation=phonebook"
}
}
}
}
}
}
before deploying the new version, which fixes this issue, either:
delete indexes in running elasticsearch service
curl -X DELETE http://cvs-elasticsearch:9203/vocabularypublish,vocabularyeditor
stop cvs-elasticsearch service, remove it, delete its docker volume cvs-es and start the service again using the updated docker compose yaml (app.yml + elasticsearch.yml)
reindex indexes in CVS → Admin → Maintenance →
Original comment by Taina Jääskeläinen.
I forgot to tell you that if the user has chosen (or left) the language as English, the vocabularies that are only in Finnish or German (e.g. in the Editor) should not be in the English list at all. They should only appear if the user has chosen ‘All languages’ which is at the bottom of the list.
If ‘All languages’ is not chosen, the alphabetical ordering is by the vocabulary long name in the specific language and follows the alphabetical order of that language. But for ‘all languages’ your notes are more than relevant.
Specifying the problem in the staging (both in Home and Editor list)
The German or Finnish lists do not contain any vocabularies that are not available in those languages. They are mostly in alphabetical order but not totally
So there are two issues: vocabularies in the wrong language list and vocabularies not in alphabetical order.
Please let me know if you need further clarification.
We made the decision to put ‘All languages’ at the bottom of the list because we noticed that the all languages search did not in many cases find the terms or definitions. It found them only when the search terms used were in the exactly same format as in the text (so if search term in singular, did not find plural or other forms). Language analyzers do not seem to work well across languages.
Original comment by Stefan Dlugolinsky (GitHub: Stifo).
Update:
I have fixed the issue with alphabetical ordering of vocabularies when a particular language is selected. It uses an ICU analysis plugin with collation support for different languages.
However, ordering when “All languages” is selected is still open.
I can’t reproduce vocabularies in the wrong language list
Original comment by Taina Jääskeläinen.
I think I have found one reason for alphabetical ordering not being right in the English list:
Currently, the Editor in staging has these vocabularies on page 2 in the English list. None of them have an English version.
Would there be any defaults that would lead to them being included in the English list?
I think all the problems with vocabularies appearing in a language list when there is no version in that language are in the English list.
Therefore, these two English list problem would need solving:
Then we would see what potential sorting issue remain after this.
In ‘All languages’ list I see three options:
Original comment by Stefan Dlugolinsky (GitHub: Stifo).
All languages
titleAll
in vocabularies for correct sorting when “All languages” is selected. This field is filled with a SL title (e.g., titleAll=titleFi if sourceLanguage is Finnish) and when “All languages” is selected, it is used to sort the results using language neutral DUCET collation.Search results are sorted by selected language as well as displayed in that particular language in the result list
Only versions with a selected language are displayed in the result list
This fix requires to reindex the indexes:
CVS → Admin → Maintenance →
Original comment by Stefan Dlugolinsky (GitHub: Stifo).
will be ready to test in dev and staging after the new version is deployed
Original comment by Stefan Dlugolinsky (GitHub: Stifo).
there are some problems after the deployment; i.e., "Internal server error" on "All languages”
@Joshocan Q: When a deployment (dev+staging) is made and there was an update of the elasticsearch's container command in elasticsearch.yml, is the elasticsearch (ES) also redeployed? If not, we need to redeploy all the ES shards in order to make them install collation plugin for correct alphabetical sorting, which is used by this fix. An update: It looks like there’s a different elasticsearch.yml
used in master deployment than it is in the src/main/docker
, because there’s an installation of repository-gcs at the docker image build, while this thing is missing in the elasticsearch.yml of master branch.
Original comment by Joshua Tetteh Ocansey (GitHub: Joshocan).
@Stifo Issue #447 ( Analysis-icu plugin for ES installed). Please test on dev or staging and confirm if the issue is resolved.
Original comment by Stefan Dlugolinsky (GitHub: Stifo).
Made a comment in #447, additional step is required before testing.
Original comment by Stefan Dlugolinsky (GitHub: Stifo).
@TainaFSD you can now test this issue in both dev and staging
Original comment by Stefan Dlugolinsky (GitHub: Stifo).
icu analysis plugin was installed in elasticsearch for both dev and staging in #447
Original comment by Taina Jääskeläinen.
Tested in staging both in Chrome and Firefox. The results are good:
Original report on BitBucket by Taina Jääskeläinen.
It seems that the alphabetical ordering in the results list is now incorrect in most languages.
For instance, Slovenian is otherwise in alphabetical order but Časovna metoda (TimeMethod) comes las in the production version. It should not.
The list in English in the Editor in staging contains vocabularies
This is not a design issue so perhaps wait till the versioning and workflow issues for different roles have been resolved.