cessda / cessda.cdc.versions

Issue track and wiki for the CESSDA Data Catalogue
https://datacatalogue.cessda.eu/
Apache License 2.0
0 stars 0 forks source link

Topics/keywords in sentence case; Finnish alphabet not followed #77

Closed cessda-bitbucket-importer closed 5 years ago

cessda-bitbucket-importer commented 5 years ago

Original report on BitBucket by Taina Jääskeläinen.


Topic terms and keywords come out funny in the Finnish study descriptions. Some letters come out wrong: 'ä' changes to 'a' and 'ö' changes to 'o' Is this because these fields are not UTF-8 or UNICODE?

The terms should be in sentence case (Capital letter only in the beginning of the first word of the term). Examples:
Kestava Kehitys should be Kestävä kehitys
Kansainvalinen yhteistyo should be Kansainvälinen yhteistyö

You can find these examples by entering the study number FSD3187 into the search box and choosing 'suomi' as language once in the study description.

In fact, even the English topics and keywords should be in sentence case, as they are in the UKDS Data Catalogue. Otherwise it is sometimes difficult to tell when one term starts and when the other one begins. Moreover, all these capital letters make hard reading.

International Politics And Organisations should be International politics and organisations

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


I believe the problems with 'ä' changes to 'a' and 'ö' changes to 'o' are fixed now.

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Looking at the raw JSON, it looks like the use of title case rather than sentence case is at our end, as e.g. "id":"social_sciences","term":"social sciences" is displayed as ‘Social Sciences’

cessda-bitbucket-importer commented 5 years ago

Original comment by Taina Jääskeläinen.


Is there some dev version where I can check the issue with the Finnish letters? No changes yet in the published database.

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


The changes have been implemented in the development instance: [https://datacatalogue-dev.cessda.eu/](https://datacatalogue-dev.cessda.eu/)

cessda-bitbucket-importer commented 5 years ago

Original comment by Taina Jääskeläinen.


ä' changes to 'a' and 'ö' changes to 'o' not fixed yet in dev.

For example, in https://datacatalogue-dev.cessda.eu/detail?q="Finish-Data-Services__oai:fsd.uta.fi:FSD3250"

Asiasanat (last element): Should have “Työelämä”, not “Tyoelama”.

BTW: Should it be “Finnish-Data-Services” in the URL?

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Looks fine from the back end

  1. Accents & Accented characters are preserved
  2. Original case of letters from the harvested records are preserved. See example record json for the example record “FSD3187“ attached json from elasticsearch FSD3187_Record.json

@john-shepherdson one for Frontend to check.

  1. Accents | Accented

Some part of the site displays these characters correctly see screenshot below.

2. Word case

This seems to only be happening for topics and classification and would need to be changed in the UI. See below

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


@‌AshleyFox Please take a look when you can.

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Topic terms and keywords come out funny in the Finnish study descriptions. Some letters come out wrong: 'ä' changes to 'a' and 'ö' changes to 'o' Is this because these fields are not UTF-8 or UNICODE?

The terms should be in sentence case (Capital letter only in the beginning of the first word of the term). Examples:
Kestava Kehitys should be Kestävä kehitys
Kansainvalinen yhteistyo should be Kansainvälinen yhteistyö

You can find these examples by entering the study number FSD3187 into the search box and choosing 'suomi' as language once in the study description.

In fact, even the English topics and keywords should be in sentence case, as they are in the UKDS Data Catalogue. Otherwise it is sometimes difficult to tell when one term starts and when the other one begins. Moreover, all these capital letters make hard reading.

International Politics And Organisations should be International politics and organisations

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


cessda-bitbucket-importer commented 5 years ago

Original comment by Ashley Fox.


@doraVentures Using startCase() was my attempt to make the topics and classifications more presentable. Apologies if this caused confusion and took up your time. I will remove case conversion from the UI.

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Hey @‌AshleyFox , the raw data is not pretty nor consistent. May be instead of startCase() we should sentenceCase() as suggested in the issue description here?

cessda-bitbucket-importer commented 5 years ago

Original comment by Taina Jääskeläinen.


Confirming that ‘ä' to ‘a’ and ‘ö' to ‘o’ changes seem to happen only in topic and keyword elements, not in other elements.

Not sure if relevant but ä and ö are not accents but separate letters in the Finnish alphabet http://users.jyu.fi/~pamakine/kieli/suomi/aanneoppi/aakkoseten.html

cessda-bitbucket-importer commented 5 years ago

Original comment by Ashley Fox.


Should be fixed with [pull request](link to pull request removed). Topics and keywords now use sentence case. Fixed deburring of accent characters.

cessda-bitbucket-importer commented 5 years ago

Original comment by Ashley Fox.


Pending pull request approval.

cessda-bitbucket-importer commented 5 years ago

Original comment by Taina Jääskeläinen.


Not yet working in dev.

Another issue just noticed in Topic and Keyword in the detailed study view: Words with hyphen drop the hyphen, and commas are dropped.

For example, topic ‘tasa-arvo, eriarvoisuus ja syrjäytyminen’ turns into ‘Tasa Arvo Eriarvoisuus Ja Syrjaytyminen’. Keyword ‘Hazards, accidents and disasters’ drops the comma.

This does not happen in the topic filter, there terms OK.

cessda-bitbucket-importer commented 5 years ago

Original comment by Moses Mansaray (GitHub: doraVentures).


Looks fine in the Record json coming from Elasticsearch. Here is an example regarding I have found searching for the term “Hazards + accidents + disasters“ : https://datacatalogue-dev.cessda.eu/api/json/cmmstudy_en/NSD-(Nesstar)__http%3A%2F%2Fnsddata.nsd.uib.no%3A80%2Fobj%2FfStudy%2FNSD1982

Hey @‌AshleyFox when you can, please have a look at the text case function in the UI.

See

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


@‌TainaFSD a build implementing this change was only made available late last night. Please check dev instance again and let me know if this is now fixed. Thanks.

cessda-bitbucket-importer commented 5 years ago

Original comment by Ashley Fox.


@‌TainaFSD The recent changes should also allow hyphens and commas. Please let me know if this is not the case. Thanks.

cessda-bitbucket-importer commented 5 years ago

Original comment by Taina Jääskeläinen.


Things look fine now, hyphens and commas in topic terms as well as the Finnish letters. You can resolve the issue.

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Topic terms and keywords come out funny in the Finnish study descriptions. Some letters come out wrong: 'ä' changes to 'a' and 'ö' changes to 'o' Is this because these fields are not UTF-8 or UNICODE?

The terms should be in sentence case (Capital letter only in the beginning of the first word of the term). Examples:
Kestava Kehitys should be Kestävä kehitys
Kansainvalinen yhteistyo should be Kansainvälinen yhteistyö

You can find these examples by entering the study number FSD3187 into the search box and choosing 'suomi' as language once in the study description.

In fact, even the English topics and keywords should be in sentence case, as they are in the UKDS Data Catalogue. Otherwise it is sometimes difficult to tell when one term starts and when the other one begins. Moreover, all these capital letters make hard reading.

International Politics And Organisations should be International politics and organisations

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Based on feedback from Taina

cessda-bitbucket-importer commented 5 years ago

Original comment by John Shepherdson (GitHub: john-shepherdson).


Topic terms and keywords come out funny in the Finnish study descriptions. Some letters come out wrong: 'ä' changes to 'a' and 'ö' changes to 'o' Is this because these fields are not UTF-8 or UNICODE?

The terms should be in sentence case (Capital letter only in the beginning of the first word of the term). Examples:
Kestava Kehitys should be Kestävä kehitys
Kansainvalinen yhteistyo should be Kansainvälinen yhteistyö

You can find these examples by entering the study number FSD3187 into the search box and choosing 'suomi' as language once in the study description.

In fact, even the English topics and keywords should be in sentence case, as they are in the UKDS Data Catalogue. Otherwise it is sometimes difficult to tell when one term starts and when the other one begins. Moreover, all these capital letters make hard reading.

International Politics And Organisations should be International politics and organisations