Open matself opened 3 years ago
I can see the special characters in our test instance running GN 3.12.x (https://vanilla.geocat.net/geonetwork)
Please feel free to explore the contents of my metadata repository at http://geodatatorget.se:8080/geonetwork/srv/eng/catalog.search#/home Maybe you can see some other cause for the error. All of it is harvested data.
Just to make sure, I compiled a local instance from current code. After having erased the previous H2 database I ran it in Jetty. I notice that the GEMET thesaurus is preloaded with all current languages. It does not have to be downloaded. Then I harvested two sources, one GeoNetwork node and one CSW service with in all 8050 records. After having enabled INSPIRE in the UI, I got the exact same result as in the original bug post.
Checking your instance I see some of the metadata are indexed using the inspire Uri and others the value of the text:
[{
"doc_count": 54,
"key": "geografiska meteorologiska förhållanden"
}, {
"doc_count": 37,
"key": "http://inspire.ec.europa.eu/theme/ge"
}
]
For example this request http://geodatatorget.se:8080/geonetwork/srv/api/search/records/_search with this payload:
{
"size": 0,
"track_total_hits": true,
"query": {
"bool": {
"must": {
"query_string": {
"query": "+isTemplate:n"
}
}
}
},
"aggs": {
"inspireThemeUri": {
"terms": {
"field": "inspireThemeUri",
"size": 34
}
},
"cl_topic.key": {
"terms": {
"field": "cl_topic.key",
"size": 20
}
},
"cl_hierarchyLevel.key": {
"terms": {
"field": "cl_hierarchyLevel.key",
"size": 10
}
}
}
}
is returning a mix of URLs and text as key
field:
{
"inspireThemeUri": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1,
"buckets": [
{
"key": "transportnät",
"doc_count": 59
},
{
"key": "geografiska meteorologiska förhållanden",
"doc_count": 54
},
{
"key": "naturliga riskområden",
"doc_count": 44
},
{
"key": "anläggningar för miljöövervakning",
"doc_count": 41
},
{
"key": "skyddade områden",
"doc_count": 39
},
{
"key": "http://inspire.ec.europa.eu/theme/ge",
"doc_count": 37
},
{
"key": "områden med särskild förvaltning/begränsningar/reglering samt enheter för rapportering",
"doc_count": 33
},
{
"key": "http://inspire.ec.europa.eu/theme/hy",
"doc_count": 32
},
{
"key": "allmännyttiga och offentliga tjänster",
"doc_count": 31
},
{
"key": "geografiska oceanografiska förhållanden",
"doc_count": 30
},
{
"key": "http://inspire.ec.europa.eu/theme/sd",
"doc_count": 28
},
{
"key": "människors hälsa och säkerhet",
"doc_count": 22
},
{
"key": "http://inspire.ec.europa.eu/theme/br",
"doc_count": 17
},
{
"key": "produktions- och industrianläggningar",
"doc_count": 17
},
{
"key": "http://inspire.ec.europa.eu/theme/hb",
"doc_count": 15
},
{
"key": "--- inspire tema",
"doc_count": 13
},
{
"key": "http://inspire.ec.europa.eu/theme/au",
"doc_count": 13
},
{
"key": "havsområden",
"doc_count": 12
},
{
"key": "http://inspire.ec.europa.eu/theme/bu",
"doc_count": 12
},
{
"key": "höjd",
"doc_count": 10
},
{
"key": "befolkningsfördelning – demografi",
"doc_count": 8
},
{
"key": "landtäcke",
"doc_count": 8
},
{
"key": "http://inspire.ec.europa.eu/theme/er",
"doc_count": 7
},
{
"key": "http://inspire.ec.europa.eu/theme/oi",
"doc_count": 6
},
{
"key": "markanvändning",
"doc_count": 6
},
{
"key": "http://inspire.ec.europa.eu/theme/ad",
"doc_count": 5
},
{
"key": "fastighetsområden",
"doc_count": 4
},
{
"key": "http://inspire.ec.europa.eu/theme/gn",
"doc_count": 4
},
{
"key": "http://inspire.ec.europa.eu/theme/so",
"doc_count": 4
},
{
"key": "jordbruks- och vattenbruksanläggningar",
"doc_count": 4
},
{
"key": "http://inspire.ec.europa.eu/theme/mr",
"doc_count": 3
},
{
"key": "http://inspire.ec.europa.eu/theme/su",
"doc_count": 3
},
{
"key": "atmosfäriska förhållanden",
"doc_count": 2
},
{
"key": "ii.1 höjd",
"doc_count": 1
}
]
}
Yes, I know. I explained that in my bug report. And there is no reason why the records should be referenced differently in that way in the source. So the bug must be where the records are classified during harvest, and where those with swedish characters in the title gets referenced differently. Because that is the apparent difference.
inspireThemesUri
is the result of an analysis process (see
https://github.com/geonetwork/core-geonetwork/blob/main/schemas/iso19139/src/main/plugin/iso19139/index-fields/index.xsl#L464-L468) so it contains "synonyms" of the value.
You have various alternative here,
1) you can do something like
https://github.com/eea/geonetwork-eea/blob/eea-4.0.6/web-ui/src/main/resources/catalog/js/CatController.js#L115-L119 to include only URIs which will be translated client side
2) or use the th_...
for INSPIRE and use the .default
property if your catalogue is in one language only.
As indicated by Juan, your facet return a mix of URIs and label.
Can you try suggestion 1 of previous comment in the admin > settings > UI > home facet config and adapt inspireThemeUri entry to:
'inspireThemeUri': {
'terms': {
'field': 'inspireThemeUri',
'size': 34,
'include': 'http://.*'
I added 'include': 'http://.*'
following your suggestion. This only had the result that the INSPIRE themes with Swedish characters disappeared entirely from the view. Previously, they were at least represented with empty tags or bricks.
I fail to see how the suggestion would affect records in the data base, but ran a new harvest on the Geonetwork node just to make sure. No change.
Removing the include.. statement brought everything back to the beginning of this thread.
I need some advice on how to configure the Geonetwork harvester to correctly import INSPIRE theme designations with Swedish characters. All the others are imported just fine, and translated according to the GEMET list.
This is the source: https://www.geodata.se/geodataportalen
I added
'include': 'http://.*'
following your suggestion. This only had the result that the INSPIRE themes with Swedish characters disappeared entirely from the view.
Hum, it should not, looking at the badge, the one with URI are corrects
eg.
So including only http keys should be fine. If not, other values will add a css class badge-text pull-left inspire-människors hälsa och säkerhet-sv
which creates the blank badges.
Describe the bug Theme labels or bricks with swedish characters in the string are left without icon and label.
To Reproduce Steps to reproduce the behavior:
Expected behavior All themes should have labels according to the GEMET file.
Screenshots
Log file If applicable, add the server log file to help trace your problem.
Desktop (please complete the following information):
Additional context I have downloaded the thesaurus (an xml file, not rdf btw). This is a comparison of the sections for a theme with, and without title and icon. (snippets are from Notepad, the chinese characters below are displayed as xE5, xE4 and represents Swedish diacritic characters) With Title and Icon - Geology
With no Title and Icon - Protected Sites
The problem seems to be swedish characters in the skos:prefLabel tag. This parameter matches the lack of titles in the Geonetwork layout. Not sure what to do about that, other than to file this bug report. Here are the search strings from the UI for the two samples above.
http://localhost:8080/geonetwork/srv/swe/catalog.search#/search?query_string={%22inspireThemeUri%22:%20{%22http://inspire.ec.europa.eu/theme/ge%22:%20true}%20}
http://localhost:8080/geonetwork/srv/swe/catalog.search#/search?query_string={%22inspireThemeUri%22:%20{%22skyddade%20omr%C3%A5den%22:%20true}%20}
Note that Geology references a GE theme, while Protected Sites uses a local search string. They both produce a selection from the database, though. And all seemingly tagged with the correct INSPIRE theme.