INSPIRE-MIF / gp-geopackage-encodings

Good practice for GeoPackage encodings of INSPIRE datasets
7 stars 4 forks source link

[END] Case of table names, column names, etc. #11

Open heidivanparys opened 3 years ago

heidivanparys commented 3 years ago

The resulting GeoPackage files contain tables and columns with camel case identifiers, but the GeoPackage specification seems to require lower case identifiers (see https://github.com/opengeospatial/geopackage/issues/603 for a question regarding that).

So if lower case identifiers are indeed required, e.g. table MajorAirportSource should have name majorairportsource instead.

Even if it would not be a requirement in the GeoPackage specification, I still think that names should be lower case, camel case is really not common for databases (although that SQLite doesn't actually care, see also https://www.alberton.info/dbms_identifiers_and_case_sensitivity.html).

heidivanparys commented 3 years ago

Note also the following informational message in the GeoPackage specification:

[...] For maximum interoperability, all GeoPackage table, view, column, trigger, and constraint name values SHOULD start with a lowercase character and only include lowercase characters, numbers 0-9, and underscores (_) [...]

thorsten-reitz commented 3 years ago

Result of discussion on 30.06.2021:

HerzovanderWal commented 2 years ago

I don't know much about flatening. What I do know is that keeping the underscores in the requirement is not blocking the proces(ses) of delivering data. I think that leaving the underscores makes it difficult to reuse the value. For instance telling which character to make a Capital.

Can you agree on that Thorsten?

Yesterday I had my chat RWS with PDOK, We are keyusers in INSPIRE in the Netherlands. It was about issues on the Dutch work floor around INSPIRE-services. One of them is we have to make double the datasets for EU purposes. Once with snakecase written content and the other with CamelCase written content.

The issues is: OGC recommends snakecase as the GoodPractice in delivering data in for instance geopackages and CamelCase is required by the thema-specialists.

The locally gathered data is combined on several levels from municipalities, provinces, länder, countries to EU-level. On each level we need to deliver and thus transform the data from snakecase to camelcase. That’s weird isn’t it?

Thinking technicaly and from sustainability site it is a waste of effort and energy.

This could be solved by changing the specifications that require the delivery of the data in an other way as snakecase.

The interfaces can transform this to lots of other representations of the data.

This is a small step for us and a big step for mankind.

How can we get this done?

KathiSchleidt commented 2 years ago

A general question pertaining to the wider (alternative) encoding landscape - is there a central list showing the various encoding approaches by technology? (From what I've seen, the discrepancies come from requirements/good practices underlying the individual encoding formats, e.g., JSON or GeoPackage) Such an overview list would show what alternatives are being proposed, as well as providing guidance in how to transform names when shifting between encodings. This should ideally also take conventions being specified in the OGC into account. :? Kathi

heidivanparys commented 2 years ago

I don't think there is an official central list of that. These kinds of approaches are usually described in “style guides”, often authored by companies, not standardisation organisations.

Unofficial but useful lists may be:

General:

Company-specific:

HerzovanderWal commented 2 years ago

What is the consequence: UseCase gets use_case?: Names of featureclasses and attributes in snake_case: We can solve this with ETL -software. Check each name on capitals en if there are replace them with _smalletter. De capital on position 1 of the name gets only the small letter. Sample: ‘WaterwayLink’ becomes ‘waterway_link’ and ‘CEMTClass’ becomes ‘cemt_class’. The last sample fits not completely with the description above. Is this the right expactation?

CorMelse commented 2 years ago

The resulting GeoPackage files contain tables and columns with camel case identifiers, but the GeoPackage specification seems to require lower case identifiers (see opengeospatial/geopackage#603 for a question regarding that).

So if lower case identifiers are indeed required, e.g. table MajorAirportSource should have name majorairportsource instead.

Even if it would not be a requirement in the GeoPackage specification, I still think that names should be lower case, camel case is really not common for databases (although that SQLite doesn't actually care, see also https://www.alberton.info/dbms_identifiers_and_case_sensitivity.html).

I support your point @heidivanparys, as I also replied to @HerzovanderWal through mail: always use lowercase for databases, tables and columns. KISS is the best option to guarantee interoperability!

KathiSchleidt commented 2 years ago

Getting back to the core of this thread - should Kebabs be left in when used in the flattening context despite the contrary GeoPackage recommendation, or should all Kebabs also be modified to Snakes?

HerzovanderWal commented 2 years ago

What if there are other characters as separator? For instance #,~ or ? It will bring lots off technical issues. Probably in relation with the declared language. My suggestion keep it simple. Kebabs are not snakecase so lets modify them in the ETL-proces to snakecase.

thorsten-reitz commented 2 years ago

Hi @HerzovanderWal

Today the EEA team and me had an opportunity to discuss this issue. It is of course unfortunate that there are problems with the PDOK validator. We did test the END templates in QGIS, ArcGIS, and pretty much all standard libraries such as GDAL, GeoTools and others, without any issues.

For the END templates, we decided that we can't make such a breaking change now, because it would affect the ongoing 2022 reporting. The process is fully underway and has been implemented in many countries already. We also agree that for future reporting cycles (noise source in 2025 etc.), we would apply the lower-case recommendation, however.

For the generic INSPIRE geopackage specification, as mentioned at the beginning of this ticket, we will use underscore to separate levels of hierarchy. We intend not to use other special characters to indicate word boundaries due to incompatibility issues, so the word boundaries would be lost. The table and property names would just be entirely lower-case.

HerzovanderWal commented 2 years ago

Thank you! We understand the choice that had to be made. We will arrange the ETL-process to process lowercase with underscore separators. This will not influence the current European Noise Directive Reporting (END)