geonetwork / core-geonetwork

GeoNetwork is a catalog application to manage spatially referenced resources. It provides powerful metadata editing and search functions as well as an interactive web map viewer. It is currently used in numerous Spatial Data Infrastructure initiatives across the world.
http://geonetwork-opensource.org/
GNU General Public License v2.0
428 stars 489 forks source link

INSPIRE theme browsing fails with Swedish characters in the theme label #5862

Open matself opened 3 years ago

matself commented 3 years ago

Describe the bug Theme labels or bricks with swedish characters in the string are left without icon and label.

To Reproduce Steps to reproduce the behavior:

  1. Load some INSPIRE coded metadata from Sweden. Install GEMET INSPIRE theme for en and sv.
  2. Switch to Svenska UI language (not necessary)
  3. Open the INSPIRE theme browsing layout
  4. See that only a few bricks have labels.
  5. The common denominator is that those with labels have no swedish characters in the label string.

Expected behavior All themes should have labels according to the GEMET file.

Screenshots INSPIRE_bricks

Log file If applicable, add the server log file to help trace your problem.

Desktop (please complete the following information):

Additional context I have downloaded the thesaurus (an xml file, not rdf btw). This is a comparison of the sections for a theme with, and without title and icon. (snippets are from Notepad, the chinese characters below are displayed as xE5, xE4 and represents Swedish diacritic characters) With Title and Icon - Geology

<skos:Concept xmlns:skos="http://www.w3.org/2004/02/skos/core#" rdf:about="http://inspire.ec.europa.eu/theme/ge">
    <skos:prefLabel xml:lang="en">Geology</skos:prefLabel>
    <skos:scopeNote xml:lang="en">Geology characterised according to composition and structure. Includes bedrock, aquifers and geomorphology.</skos:scopeNote>
    <skos:prefLabel xml:lang="sv">Geologi</skos:prefLabel>
    <skos:scopeNote xml:lang="sv">Geologiska förh嬬anden indelade efter sammans䴴ning och struktur. Innefattar berggrund, akviferer och geomorfologi.</skos:scopeNote>
    <skos:altLabel>ge</skos:altLabel>
    <skos:inScheme rdf:resource="http://inspire.ec.europa.eu/theme" />
  </skos:Concept>

With no Title and Icon - Protected Sites

<skos:Concept xmlns:skos="http://www.w3.org/2004/02/skos/core#" rdf:about="http://inspire.ec.europa.eu/theme/ps">
    <skos:prefLabel xml:lang="en">Protected sites</skos:prefLabel>
    <skos:scopeNote xml:lang="en">Area designated or managed within a framework of international, Community and Member States' legislation to achieve specific conservation objectives.</skos:scopeNote>
    <skos:prefLabel xml:lang="sv">Skyddade omr夥n</skos:prefLabel>
    <skos:scopeNote xml:lang="sv">Omr夥n som 䲠utsedda eller förvaltas inom ramen för internationell lagstiftning, gemenskapslagstiftning och medlemsstaternas lagstiftning för att uppn堳pecifika miljöv岤sm嬮</skos:scopeNote>
    <skos:altLabel>ps</skos:altLabel>
    <skos:inScheme rdf:resource="http://inspire.ec.europa.eu/theme" />
  </skos:Concept>

The problem seems to be swedish characters in the skos:prefLabel tag. This parameter matches the lack of titles in the Geonetwork layout. Not sure what to do about that, other than to file this bug report. Here are the search strings from the UI for the two samples above. http://localhost:8080/geonetwork/srv/swe/catalog.search#/search?query_string={%22inspireThemeUri%22:%20{%22http://inspire.ec.europa.eu/theme/ge%22:%20true}%20}

http://localhost:8080/geonetwork/srv/swe/catalog.search#/search?query_string={%22inspireThemeUri%22:%20{%22skyddade%20omr%C3%A5den%22:%20true}%20} Note that Geology references a GE theme, while Protected Sites uses a local search string. They both produce a selection from the database, though. And all seemingly tagged with the correct INSPIRE theme.

juanluisrp commented 3 years ago

I can see the special characters in our test instance running GN 3.12.x (https://vanilla.geocat.net/geonetwork)

image

matself commented 3 years ago

Please feel free to explore the contents of my metadata repository at http://geodatatorget.se:8080/geonetwork/srv/eng/catalog.search#/home Maybe you can see some other cause for the error. All of it is harvested data.

matself commented 3 years ago

Just to make sure, I compiled a local instance from current code. After having erased the previous H2 database I ran it in Jetty. I notice that the GEMET thesaurus is preloaded with all current languages. It does not have to be downloaded. Then I harvested two sources, one GeoNetwork node and one CSW service with in all 8050 records. After having enabled INSPIRE in the UI, I got the exact same result as in the original bug post.

juanluisrp commented 3 years ago

Checking your instance I see some of the metadata are indexed using the inspire Uri and others the value of the text:

[{
    "doc_count": 54,
    "key": "geografiska meteorologiska förhållanden"
  }, {
    "doc_count": 37,
    "key": "http://inspire.ec.europa.eu/theme/ge"
  }
]

For example this request http://geodatatorget.se:8080/geonetwork/srv/api/search/records/_search with this payload:

{
    "size": 0,
    "track_total_hits": true,
    "query": {
        "bool": {
            "must": {
                "query_string": {
                    "query": "+isTemplate:n"
                }
            }
        }
    },
    "aggs": {
        "inspireThemeUri": {
            "terms": {
                "field": "inspireThemeUri",
                "size": 34
            }
        },
        "cl_topic.key": {
            "terms": {
                "field": "cl_topic.key",
                "size": 20
            }
        },
        "cl_hierarchyLevel.key": {
            "terms": {
                "field": "cl_hierarchyLevel.key",
                "size": 10
            }
        }
    }
}

is returning a mix of URLs and text as keyfield:


{
        "inspireThemeUri": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1,
            "buckets": [
                {
                    "key": "transportnät",
                    "doc_count": 59
                },
                {
                    "key": "geografiska meteorologiska förhållanden",
                    "doc_count": 54
                },
                {
                    "key": "naturliga riskområden",
                    "doc_count": 44
                },
                {
                    "key": "anläggningar för miljöövervakning",
                    "doc_count": 41
                },
                {
                    "key": "skyddade områden",
                    "doc_count": 39
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/ge",
                    "doc_count": 37
                },
                {
                    "key": "områden med särskild förvaltning/begränsningar/reglering samt enheter för rapportering",
                    "doc_count": 33
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/hy",
                    "doc_count": 32
                },
                {
                    "key": "allmännyttiga och offentliga tjänster",
                    "doc_count": 31
                },
                {
                    "key": "geografiska oceanografiska förhållanden",
                    "doc_count": 30
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/sd",
                    "doc_count": 28
                },
                {
                    "key": "människors hälsa och säkerhet",
                    "doc_count": 22
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/br",
                    "doc_count": 17
                },
                {
                    "key": "produktions- och industrianläggningar",
                    "doc_count": 17
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/hb",
                    "doc_count": 15
                },
                {
                    "key": "--- inspire tema",
                    "doc_count": 13
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/au",
                    "doc_count": 13
                },
                {
                    "key": "havsområden",
                    "doc_count": 12
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/bu",
                    "doc_count": 12
                },
                {
                    "key": "höjd",
                    "doc_count": 10
                },
                {
                    "key": "befolkningsfördelning – demografi",
                    "doc_count": 8
                },
                {
                    "key": "landtäcke",
                    "doc_count": 8
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/er",
                    "doc_count": 7
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/oi",
                    "doc_count": 6
                },
                {
                    "key": "markanvändning",
                    "doc_count": 6
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/ad",
                    "doc_count": 5
                },
                {
                    "key": "fastighetsområden",
                    "doc_count": 4
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/gn",
                    "doc_count": 4
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/so",
                    "doc_count": 4
                },
                {
                    "key": "jordbruks- och vattenbruksanläggningar",
                    "doc_count": 4
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/mr",
                    "doc_count": 3
                },
                {
                    "key": "http://inspire.ec.europa.eu/theme/su",
                    "doc_count": 3
                },
                {
                    "key": "atmosfäriska förhållanden",
                    "doc_count": 2
                },
                {
                    "key": "ii.1 höjd",
                    "doc_count": 1
                }
            ]
        }
matself commented 3 years ago

Yes, I know. I explained that in my bug report. And there is no reason why the records should be referenced differently in that way in the source. So the bug must be where the records are classified during harvest, and where those with swedish characters in the title gets referenced differently. Because that is the apparent difference.

fxprunayre commented 3 years ago

inspireThemesUri is the result of an analysis process (see https://github.com/geonetwork/core-geonetwork/blob/main/schemas/iso19139/src/main/plugin/iso19139/index-fields/index.xsl#L464-L468) so it contains "synonyms" of the value.

You have various alternative here, 1) you can do something like https://github.com/eea/geonetwork-eea/blob/eea-4.0.6/web-ui/src/main/resources/catalog/js/CatController.js#L115-L119 to include only URIs which will be translated client side 2) or use the th_... for INSPIRE and use the .default property if your catalogue is in one language only.

fxprunayre commented 3 years ago

As indicated by Juan, your facet return a mix of URIs and label.

image

Can you try suggestion 1 of previous comment in the admin > settings > UI > home facet config and adapt inspireThemeUri entry to:

            'inspireThemeUri': {
              'terms': {
                'field': 'inspireThemeUri',
                'size': 34,
                'include': 'http://.*'
matself commented 3 years ago

I added 'include': 'http://.*' following your suggestion. This only had the result that the INSPIRE themes with Swedish characters disappeared entirely from the view. Previously, they were at least represented with empty tags or bricks. I fail to see how the suggestion would affect records in the data base, but ran a new harvest on the Geonetwork node just to make sure. No change. Removing the include.. statement brought everything back to the beginning of this thread. I need some advice on how to configure the Geonetwork harvester to correctly import INSPIRE theme designations with Swedish characters. All the others are imported just fine, and translated according to the GEMET list. This is the source: https://www.geodata.se/geodataportalen

fxprunayre commented 3 years ago

I added 'include': 'http://.*' following your suggestion. This only had the result that the INSPIRE themes with Swedish characters disappeared entirely from the view.

Hum, it should not, looking at the badge, the one with URI are corrects image

eg.

So including only http keys should be fine. If not, other values will add a css class badge-text pull-left inspire-människors hälsa och säkerhet-sv which creates the blank badges.