INSPIRE-MIF / helpdesk-validator

Community discussion forum for INSPIRE validation issues
42 stars 23 forks source link

Error raised for missing initial capital letter in the keywords for spatial scope #467

Closed AntoRot closed 3 years ago

AntoRot commented 3 years ago

When testing a dataset metadata record using the Conformance Class 2b (INSPIRE data sets and data set series metadata for Monitoring), I noticed that an error is raised as the keyword for spatial scope is expressed in lowercase, without the initial capital letter (i.e. "regional" instead of "Regional" as in the Spatial Scope Register).

The metadata record tested: https://geodati.gov.it/RNDT/rest/document?id=r_umbria%3A00009%3A20141117%3A073628

The Test Report: https://inspire.ec.europa.eu/validator//v2/TestRuns/EIDb6efa820-afc0-4d0a-881a-ecab689cf27a.html

The Assertion URI: https://inspire.ec.europa.eu/validator//v2/TestRuns/EIDb6efa820-afc0-4d0a-881a-ecab689cf27a.html?lang=it#EID7562c826-a541-4774-b3a0-d5da055433f6

I'm wondering if the test for those keywords could be relaxed by also accepting the keywords expressed in lowercase.

iuriemaxim commented 3 years ago

I think that this is the role of the registry: to ensure standardisation. We had a simmilar issue: codelist vs codeList due to an inconsistency in the TG, as can be seen in issue #407.

However it is important to stick to rules, otherwise for those that want to use the services and metadata provided, will be very difficult to implement a tool for aggregating data (and metadata).

In the database that is storing all the metadata that is accessible and indexed in the EC Geoportal, there will be then some records ”Regional” and some other „regional”. Assuming is a Relational Database, depending how the database/table/field collation is set in the database, ”Regional” could be different than ”regional” or could be the same.

INSPIRE infrastructure is mainly designed for machine to machine communication, therefore standardisation is critical.

As “codeList” is different than “codelist”, an an error is triggered if not provided correctly (even if records exist in this case in the INSPIRE Registry for both), same should be with “Regional” beeing different from “regional” (in this case the INSPIRE Registry is not keeping both records, so is even stricter).

In case that for some therms could be accepted to be case insensitive, than the same rule should be applied for all therms. This is because databases/tables/fields can be set case sensitive or case insensitive and the difference is huge, It is not possible in a database to have some records in a field set to be case sensitive ans other records in the same field to be case insensitive (i.e keywords) and to write code in order to mimic this is way too complicated. It is much simplier to correct the data.

Databases are also accent sensitive or accent insensitive (i.e.: making a difference between image

INSPIRE is also Accent Sensitive.

Issue https://github.com/inspire-eu-validation/community/issues/402 can be consulted, as we faced this problem with some Romanian special characters. Even if they look simmilar to people, for machines they are different.

image

These are very important aspects from the IT perspective, Some details about collations in the databases: https://database.guide/what-is-collation-in-databases/

”In database systems, Collation specifies how data is sorted and compared in a database. Collation provides the sorting rules, case, and accent sensitivity properties for the data in the database.

For example, when you run a query using the ORDER BY clause, collation determines whether or not uppercase letters and lowercase letters are treated the same.

Collation is also used to determine how accents are treated, as well as character width and Japanese kana characters. Collation can also be used to distinguish between various ideographic variation selectors in certain collations.

Different database management systems will provide different collation options. Depending on the DBMS, collation can be specified at the server level, the database level, the table level, and the column level. Collations can also be specified at the expression level (so you can specify which collation to use when you run a query), and at the identifier level”

carlospzurita commented 3 years ago

Dear @AntoRot

Thank you for opening this issue. This an interesting discussion, and may open the way to other code value checks as the one @iuriemaxim has referenced. However, a convention must be followed, and the general rule should be to stick to the value as it is declared in the source (in this case, the INSPIRE registry)

However, we are going to discuss this issue internally and share any feedback that may come.

dperezBM commented 3 years ago

Dear all,

Since this issue had no interaction time ago, we decided to close it. Please feel free to open a new one if needed.

Thank you and best regards.