INSPIRE-MIF / helpdesk-geoportal

Community discussion for INSPIRE geoportal topics
11 stars 3 forks source link

GeoNetwork Harvesting: DE Catalog #147

Closed mouad-aissa closed 4 months ago

mouad-aissa commented 1 year ago

Dear Geoportal Helpdesk,

is there any news regarding the harvesting of our catalog? There was a issue in the past and we were in contact with Jordi Escriu.

We created the following instance for the harvesting: http://gdk-inspire-1.ffm.gdi-de.org/geonetwork/srv/ger/catalog.search#/home

Was the harvesting successful? Are the data now visible in Inspire Geoportal?

For further information see: https://github.com/INSPIRE-MIF/helpdesk-geoportal/issues/121

jescriu commented 11 months ago

Dear @mouad-aissa, We started yesterday testing the new (unfiltered) endpoint in a cloned separated server. One this test is successful, we will programme a harvesting in production, taking into account the tasks from the rollout of the INSPIRE Geoportal. I will keep you posted. Thank you for your patience.

jescriu commented 11 months ago

Dear @mouad-aissa, Yesterday we finished the test above-mentioned in the cloned server, which unfortunately was not successful - The harvesting and the link-checking processes phases went well, but the last phase dealing with the ingestion of data in the database crashes.

However, next week - if not interfering with the ongoing work for updating the software in the servers - we will try to thread a new harvesting process in the production instance. We will try to explore options to better managing the Java memory usage in the servers.

jescriu commented 11 months ago

Dear @mouad-aissa, GeoCat updated the configuration and version of the software (GeoNetwork) in the revamped Geoportal server last week. Unfortunately they did not implemented any improvements for better managing memory in the ingestion phase, as JRC requested. Our ICT Team will try to cover this gap during August, which means that the attempt to harvest your national endpoint will have to be postponed to 21 August onwards. Excuse me again for the inconveniences.

hallinpihlatie commented 11 months ago

Does this mean that all countries have to wait until 21 August? Its just around the corner, but anyway it would be nice to know.

I'd like to harvest once its possible again as my harvest of today wasn't successful.

jescriu commented 10 months ago

Dear @hallinpihlatie, This thread was only referring to DE, which is a large catalogue for which we still need to agree on harvesting timeframes for performance issues, avoiding overlaps with harvesting processes from other countries. I saw that you tried to harvest FI catalogue on 28 August but it was not successful - I just checked with other endpoint and the harvesting was successful. Please start another harvesting attempt while having a look at the logs from your server and CSW endpoint. Please open a new issue in the helpdesk to continue with the feedback and support.

mouad-aissa commented 9 months ago

Dear @jescriu gives news about harvesting our data. As you have already informed, there was a new attempt planned for August 21st. How did the harvesting go here? Thank you for your efforts

jescriu commented 9 months ago

Dear @mouad-aissa, The intention was to retake the attempts on 21st August, but was not finally possible due to the activities for rolling out the INSPIRE Geoportal, which are still on-going. We will try to squeeze this activity in between, using a cloned server.

hogredan commented 9 months ago

Dear @jescriu,

we are coming close to the annual INSPIRE monitoring and due to the unsuccessful harvesting it is currently not possible for us to check whether the resources are discoverable in the INSPIRE geoportal and whether accessibility to the data is ensured. This makes a monitoring prediction and corresponding quality management extremely difficult. It would therefore be very helpful if we could make progress here.

Best regards, Daniela (on behalf of the German National Contact Point)

jescriu commented 9 months ago

Dear @hogredan, I was going to update your team on this matter. We manage to successfully harvest your endpoint last week (in a cloned instance) - from Wed 27 to Fri 29.
See below: image

During the process we kept trace of the relevant logs and are now analysing them (with GeoCat, our contractor) to better understand the areas in which the system could be improved, with the 2-folded objective to 1) avoid these kind of issues in the future and 2) obtain the most objective indicators.

We hope to be able to reproduce this successful harvest in the GeoNetwork harvesting console instance soon. Our efforts are on it.

Thank you for your patience and understanding.

hogredan commented 9 months ago

Dear @jescriu,

thanks a lot for the quick update and your efforts!

jescriu commented 9 months ago

Dear @hogredan, May I ask you if the number of metadata records successfully harvested in the previous test is representative of the content of the German national catalogue?

GDIAnja commented 9 months ago

In order not to let the tension rise too much, here is the hyperlink with the approximate number of all relevant German data sets reported for INSPIRE: https://gdk.gdi-de.org/gdi-de/srv/ger/catalog.search#/search?resultType=details&sortBy=relevance&fast=index&_content_type=json&from=1&to=20&keyword=inspireidentifiziert (showing today: 291381 / other count: 291248 records with at least 103528 dataset plus 184587 service metadata = giving a total of 288115)

267570 harvested from EU is quite a good job, for the harvesting was from 29th of September and numbers rise constantly within GDI-DE before INSPIRE Monitoring. Logfiles on your side may show, how many metadata have been denied due to schema errors or other inconveniencies. I am sure @hogredan and @mouad-aissa will know the exact numbers from German data catalogue that very day.

jescriu commented 9 months ago

Thank for the info @GDIAnja.

bayerfa commented 8 months ago

Dear @jescriu, together with @mouad-aissa I had a look at the available data on the server. The data you harvested from [http://gdk-inspire-1.ffm.gdi-de.org/geonetwork/srv/ger/catalog.search#/home] contains the records available in april 2023 with a total count of 286743. As mentioned before 267570 successfully harvested datasets ist quite good. Next week we are planning to create a new dataset and will inform you for starting a new harvesting round from our server.

Kind regards Fabian Bayer Responsible process manager GDK - German catalogue

bayerfa commented 8 months ago

Dear @jescriu , I want to inform you about the updated records on the platform so you can start a new harvesting run. The instance currently provides 289399 records. Did you integrate the already harvested data into the INSPIRE geoportal yet?

jescriu commented 8 months ago

Dear @bayerfa, The records cannot be integrated in production. A new harvesting has to be performed in the INSPIRE Geoportal production instance, so that this information is updated together with the rest of MS / EFTA countries. Since we are now almost starting the INSPIRE Geoportal rollout, we will plan this after the necessary security assessment process. We will keep you informed.

jrc-inspire commented 4 months ago

A new harvest of the German metadata catalogue was succesfully achieved on 9 January 2024, harvest which has been used to calculate the 2023 Monitoring and Reporting indicators for this country.

As part of the analysis of the revamped INSPIRE Geoportal which was launched on 24 November 2023, based on GeoNetwork Open Source version 4.2.6, the JRC also noticed the low accessibility indicators obtained by the newly delivered system.

To prevent this error from affecting Member States and EFTA countries during the INSPIRE Monitoring and Reporting 2023, an alternative approach for evaluating accessibility of datasets through view and download services was developed. It implements Part A of the Data-services linking simplification good practice. The approach will be explained in detail during the 77th MIG-T Meeting.

In addition, we are working with the GeoNetwork community to improve the evaluation and processing of dataset accessibility in this software tool, capturing all lessons learnt and Member States’ feedback during the mentioned monitoring process in order to enhance the INSPIRE Geoportal.