Esri / geoportal-server

Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services.
https://gptogc.esri.com/geoportal
Apache License 2.0
244 stars 149 forks source link

GeoPortal should produce ATOM output expected by FGDC Status Checker #234

Open torrin47 opened 8 years ago

torrin47 commented 8 years ago

The FGDC Service Status Checker expects an ATOM service feed with the following mandatory fields: id, title, serviceType and serviceURL. The standard GeoPortal ATOM feed includes id and title, but omits serviceType and serviceURL. Although many metadata records (particularly FGDC CSDGM) don't include explicit serviceType elements, it's usually possible to infer the type of a generic onlink element with pretty good accuracy (this already seems to occur in the standard search results). If the standard GeoPortal ATOM feed could be configured to include these elements, it would greatly streamline the integration with the powerful FGDC Status Checker tool.

mhogeweg commented 8 years ago

Integration with the service checker is out of the box: https://github.com/Esri/geoportal-server/wiki/FGDC-Service-Checker-Integration and there is more to it than just providing an ATOM feed (on both sides).

torrin47 commented 8 years ago

Goldurnit, I searched for that page in this repo, and must've been using the wrong terms, wasn't able to find it. That does appear to produce the desired output when logged in, but brings us to a different issue - the authentication does not appear to be supporting our LDAP integrated SSO login. Is there a way to configure hybrid authentication similar to ArcGIS Server? Is there a particular rationale behind securing this endpoint, or could we simply open it to the public?

zguo commented 7 years ago

This end point is expecting Basic authentication credentials in the http header.

torrin47 commented 7 years ago

Right, but we're configured with full LDAP integration, which wouldn't play with the FGDC service checker. Can we disable the basic authentication requirement for this endpoint?

zguo commented 7 years ago

yes authentication can be disabled through some code change.

mhogeweg commented 7 years ago

Torrin, not sure what is the issue. The endpoint is registered with the FGDC service checker and there is where you provide account information so that the service checker can connect to your Eros feed. That's a one-time registration.

torrin47 commented 7 years ago

Use of credentials on the Eros side is optional: Image of Eros Registration It's the GeoPortal endpoint that's protected: https://github.com/Esri/geoportal-server/wiki/FGDC-Service-Checker-Integration

You will need to access your geoportal's ATOM feed service to register for your API Key. Check that the ATOM feed service in your geoportal site is working - it is accessible at http://{base_URL}/{context URL}/Eros. For example, http://server/geoportal/Eros. Note, this URL is protected and only users in the geoportal administrator group can access to the URL.

We'd like to be able to unprotect this URL: https://edg.epa.gov/metadata/Eros

mhogeweg commented 7 years ago

ok. this will require a modification to the ErosQueryServlet class to remove the authentication requirement. In Geoportal Server 2 we did decide to make the Eros endpoint open.

mhogeweg commented 7 years ago

we will provide a servlet that does not require authentication

torrin47 commented 6 years ago

Excellent, we now have a public feed! https://edg.epa.gov/metadata/ErosPublic?max=500 Unfortunately, when we attempt to register this feed with the FGDC Service Status Checker, we receive the following error:

Missing/Empty "serviceType" tag in the XML feedPlease make corrections to your feed and try again

It's certainly true that the vast majority of the entries in our feed contain empty service types. Most of these are simple URLs that have been placed into the resource.url element, but even resources such as waf, which should have a valid serviceType value, seem to be missing that value. Is this something we should raise with the EROS folks, or can it be addressed in our feed?

zguo commented 6 years ago

Geoportal analyze url patterns to "guess" the service types, in some cases it might not be able to figure out the service type of the url thus leave it empty (include waf), the logic is in file /geoportal-server/geoportal/src/com/esri/gpt/catalog/search/ResourceIdentifier.java, we think it might be ignored/skipped by the fgdc service checker earlier if the service type is empty.

torrin47 commented 6 years ago

Thanks @zguo, I suspect you're right, and I did run a test where I set all otherwise empty service types to WAF, and the registration and status checker succeeded. I also submitted a request to the FGDC folks to permit otherwise uncategorized URLs in the feed and perform only basic HTTP checks on them, which would help a great deal - they said they'll add it to the list and evaluate it in their development pipeline.

That said, I'm still puzzled by the presence of actual WAFs in our feed. Most of the WAFs we harvest are on our intranet, so we exclude them from the catalog listing. Yet they're still showing up in the ErosPublic feed, which means they're being pulled directly from the gpt_resource table in the database rather than the REST endpoint - and according to the logic in geoportal/src/com/esri/gpt/server/erosfeed/ErosQueryServlet.java they shouldn't be retrieved if PROTOCOL_TYPE is null, which means there shouldn't be any need to analyze the URL pattern - type is already there as an attribute, yet it's coming out empty. We can probably customize this class without much difficulty to exclude unlisted items, but I'm concerned that there is some bug in the code that's affecting the service type. Thoughts?

zguo commented 6 years ago

It might be because of a combination of db records and index records was used for the full output, some of the records (e.g. registered sites) are from gpt_resources table ( it will have a protocol_type value like waf, csw assigned when registering a site), other records are from the index where only url is known and type could be empty.

torrin47 commented 6 years ago

Understood - but we don't list any WAFs in our "index where only the URL is known and type could be empty." So 100% of the WAFs listed in our ErosPublic feed come from the gpt_resources table, where we can see that they do have a protocol_type value like waf - yet they do not have a serviceType in the ErosPublic feed. That's why we suspect there might be a bug.

zguo commented 6 years ago

Piotr has checked in a fix should addressing the issue. please try to see if it works for you. thanks!

torrin47 commented 6 years ago

Looks good, thanks. Hoping the EROS folks entertain our suggestion of accepting generic URLs.