Esri / geoportal-server-harvester

Metadata Harvester for Esri Geoportal Server
http://esri.github.io/geoportal-server/
Apache License 2.0
31 stars 24 forks source link

Cannot Harvest FAA FeatureService (Maybe Due to Spaces in the Name) #133

Open rhodges opened 3 years ago

rhodges commented 3 years ago

We are trying to harvest this service: https://services6.arcgis.com/ssFJjBXIUyZDrSYZ/ArcGIS/rest/services

We use "https://services6.arcgis.com/ssFJjBXIUyZDrSYZ/ArcGIS" for the endpoint, but if you open that in a browser, you are not redirected to services like normal.

When we try to harvest this AGS endpoint we see the following error in the logs: Exception in thread "HARVESTING" java.lang.IllegalArgumentException: Illegal character in path at index 6: Buffer of Runways/FeatureServer

It appears the service "Buffer of Runways" has spaces in it.

Is this a valid service name, or is there another reason we cannot harvest?

mhogeweg commented 3 years ago

I'm marking this as a bug. when navigating the service directory, this service shows up with the spaces in the URL replaced with %20:

https://services6.arcgis.com/ssFJjBXIUyZDrSYZ/ArcGIS/rest/services/Buffer%20of%20Runways/FeatureServer

if this is valid for ArcGIS Server service names the harvester should handle this case.

mhogeweg commented 2 years ago

@zguo - please confirm this is resolved with Harvester 2.6.5

rhodges commented 2 years ago

Hi @mhogeweg and @zguo -- I have updated to v2.6.5 (master branch for both catalog and harvester as of yesterday) and am trying again to harvest this service, but still no success. Test/Result cases:

I don't see anything of interest in the logs (at least not under /opt/tomcat/logs) -- the only logs getting updated from these tests are 'localhost_access_log.YYYY-MM-DD.txt':

10.0.2.2 - - [18/Nov/2021:19:10:35 +0000] "POST /harvester/rest/harvester/tasks/6251e3e0-e3ff-4116-8e54-077cbfa8f488/execute? HTTP/1.1" 200 1015
10.0.2.2 - - [18/Nov/2021:19:10:35 +0000] "GET /harvester/rest/harvester/triggers HTTP/1.1" 200 12
127.0.0.1 - - [18/Nov/2021:19:10:35 +0000] "POST /geoportal/oauth/token HTTP/1.1" 200 453
127.0.0.1 - - [18/Nov/2021:19:10:35 +0000] "POST /geoportal/elastic/metadata/item/_search?access_token=[ACCESS_TOKEN]&access_token=[ACCESS_TOKEN] HTTP/1.1" 200 139
10.0.2.2 - - [18/Nov/2021:19:10:35 +0000] "GET /harvester/rest/harvester/processes HTTP/1.1" 200 44984
10.0.2.2 - - [18/Nov/2021:19:10:35 +0000] "GET /harvester/rest/harvester/processes/fc38e4b6-399d-481d-92c5-067d09b57f4a HTTP/1.1" 200 1221
10.0.2.2 - - [18/Nov/2021:19:10:35 +0000] "GET /harvester/rest/harvester/processes/fc38e4b6-399d-481d-92c5-067d09b57f4a HTTP/1.1" 200 1221
mhogeweg commented 2 years ago

it looks as if the server has the server directory browsing disabled or perhaps because this looks like an ArcGIS Online hosted services server. when going to https://services6.arcgis.com/ssFJjBXIUyZDrSYZ/ArcGIS the site returns an invalid URL response. this is what is causing the harvester to fail. however, https://services6.arcgis.com/ssFJjBXIUyZDrSYZ/ArcGIS/rest/services does list the services.