Esri / geoportal-server-harvester

Metadata Harvester for Esri Geoportal Server
http://esri.github.io/geoportal-server/
Apache License 2.0
31 stars 24 forks source link

Harvest .gdb, .shp or .mdb as a GIS mapping file #232

Open tejesri opened 2 weeks ago

tejesri commented 2 weeks ago

Hi Team,

Geoportal Server has the capability to harvest data using a UNC path. However, we encountered an issue we harvest the .gdb, .mdb, and .shp files, Geoportal Server only harvests the file system only, But we wanted to harvest these datasets as GIS layers rather than just files. I’ve attached a snapshot below for your reference. image geoportal

Thanks & Regards, Tejpal

mhogeweg commented 2 weeks ago

hi, that is expected behavior. In order to harvest metadata from file geodatabase or personal geodatabases, you will need to write an ArcPy script that crawls folders. Something like this:

build list of folders (do not include file geodatabase folders to avoid what you experience above):

    workspaces = [x[0] for x in os.walk(start_dir) if not (x[0].endswith('.gdb'))]

    # crawl each of the folders as a workspace
    for workspace in workspaces:
        parse_workspace(workspace)

parse_workspace would be a function that does what you want with the datasets in the workspace. for example, loop over all datasets in the workspace, resulting in a metadata XML for the dataset:

    # loop over a list of ArcGIS compatible datasets
    # then loop over features in the dataset
    # then create a metadata for the feature
    # then publish metadata of the feature to the geoportal
    datasets = arcpy.ListFeatureClasses()
    for dataset in datasets:
        metadata = generate_metadata(workspace, dataset)

then you can publish to Geoportal Server through its REST API:

import requests
from requests.auth import HTTPBasicAuth

...

server = 'http://localhost:8080/geoportal/rest/metadata/item/'
auth = HTTPBasicAuth(username, password)
headers = {'Content-Type': 'application/json'}

def publish_metadata(metadata, item_id):
    the_url = server + item_id
    print(f"the_url - {the_url}")
    result = requests.put(url=the_url, data=json.dumps(metadata), auth=auth, headers=headers)
    print(f"{item_id} - {result.text}")

Here, 'the_url' is the URL to your geoportal's REST endpoint headers provides the username/password for your geoportal instance. Effectively Python does an HTTP PUT similar to the curl request below:

curl -X 'PUT' \
  'http://localhost:8080/geoportal/rest/metadata/item/abc123' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/xml' \
  -d '<?xml version="1.0" encoding="UTF-8"?>
<metadata>goes here</metadata>'
tejesri commented 2 days ago

Hi Marten,

I tried to crawl / harvest the .gdb or .shp file using the python script, but getting the below error

"Workspaces to process: ['D:\shapefile'] Processing workspace: D:\shapefile Feature classes found in D:\shapefile: ['lewis.shp'] Rasters found in D:\shapefile: [] Publishing to http://presalesgeoportal.esritech.in:8080/geoportal/rest/metadata/item/lewis.shp HTTP error occurred for lewis.shp: 400 Client Error: for url: http://presalesgeoportal.esritech.in:8080/geoportal/rest/metadata/item/lewis.shp"

I have attached the python script (geoportalserver.py) here for your reference.

Could you please check and help me out to harvest the .gdb, .shp files.

Thanks & Regards, Tejpal

geoportal_server.zip