OpenWIS / openwis

http://openwis.github.io/openwis
GNU General Public License v3.0
11 stars 15 forks source link

Various bugs in the OAI-PMH harvester #239

Open lmika-bom opened 7 years ago

lmika-bom commented 7 years ago

There are various bugs in the OAI-PMH harvesting client in OpenWIS 3.x which makes it impossible to harvest metadata from a HTTPS endpoint. This came to light with the switch-over to HTTPS only harvesting from GISC Exeter.

The bugs are located in the "oaipmh" project:

  1. In "Transport.java", the HTTP client is configured to form a cleartext HTTP connection regardless of the URL scheme taken from the harvesting configuration.
  2. In the same file, the HTTP client is not configured to follow redirects.
  3. The OAI-PMH harvester does not take into account the default HTTPS port number. When a harvester is configured with a "https" URL, the HTTP client is still configured to connect via port 80, instead of 443.
lmika-bom commented 7 years ago

The hint for bug 1 is "Transport.java" line 244. The call to config.setHost() hard-codes the scheme to "http" instead of pick up the scheme from the URL. The fix for bug 2 regarding redirects could probably be fixed by looking at the Javadoc of Apache commons HttpClient version 3.0.1 (which differs from the latest version).

lmika-bom commented 7 years ago

After investigating bug 2 a little more, not following redirects for POST messages is actually a design decision by the HTTP client used (RFC 2616 is referenced). Therefore, POST redirects may need to be handled manually (or not at all).

ywang-bom commented 7 years ago

Or maybe the OAI-PMH requests should use GET instead of POST. Though the OAI-PMH Spec does not dictate the HTTP method, it feels more appropriate to use GET.

lmika-bom commented 7 years ago

The Transport class does support either GET or POST, with POST being the default. There is also a public method to change the HTTP method, but it doesn't look to be used. I can change the default to GET and see if this would cause any problems.

lmika-bom commented 7 years ago

Initial PR containing fixes for 1 and 3. Will wait for a discussion of the use of GET and POST before fixing 2.

lmika-bom commented 7 years ago

Point 2 is the only thing keeping this issue open. If we don't want to do this, this issue can be closed.

jude2018 commented 6 years ago

@tg4444 . A query from the PMC meeting in Helsinkini - Has this been resolved in v3.14?

woollattd commented 6 years ago

I'm not aware that 'Point 2' has been addressed.

tg4444 commented 6 years ago

@jude2018, No relevant pull requests have been merged since the time @lmika-bom's fix was merged. As a result, 'Point 2' has not been addressed.