UMR-PASSAGES / csw_harvester

GNU General Public License v3.0
0 stars 0 forks source link

CSW Harvester

Synopsis

This is a python script that harvests metadata from CSW web services and saves some information from these metadata in a postgreSQL database. This script was created by Mathias Rouan and Julie Pierson, and completely refactored by Laurent Bouquin, Jeniffer Ortiz Lozano, Julien Massonneau et Abdelwahid Hadj Zoubir.

Motivation

This script is used to analyze Spatial Data Infrastructures for the GEOBS research project : https://www-iuem.univ-brest.fr/pops/projects/geobs.

Dependencies

To run the program, you must first install the following dependencies:

To run profiling test, you need to install:

To run coverage test, you need to install:

How to setup and run the program

Database

The PostgreSQL database must first be created. A database dump is provided with database/csw_harvester.sql.

psql -f csw_harvester.sql -U postgres

Here is the physical data model of the database: Physical Data Model

For the program to interact with the database, you will need to specify the following fields in the file config_database.cfg:

Sources file

The CSW list is read from a CSV file, the file structure is described below:

IDG number, name of the IDG, beginning of the recording, end of the recording, step of the recording, IDG URL, CSW URL

An example is provided with sources_test.csv. For each CSW, you can set a start in each step (for example, if set at 30, records will be extracted 30 by 30). Lines can be commented with '#'.

Program

You can then run the program python Main.py You can specify the following options :

The date option is used to force the extraction date stored in the database.

Tests

Documentation

This project is published under the General Public License v3.