gbif / crawler

The crawling pieces - ws, cli, coordinator
Apache License 2.0
4 stars 3 forks source link

DwCA: HTTP Redirect 308 (Moved Permamently) fails. #38

Closed snsb-seifert closed 4 years ago

snsb-seifert commented 4 years ago

The crawler fails on datasets which try to redirect the crawler with http status 308 (https://tools.ietf.org/html/rfc7538)

Cause: Apache HTTPClient used in gbif.httputil does not know about status 308 used to crawl in https://github.com/gbif/crawler/blob/a2ebaaac77448c0045948f0d8c0a0ee84b642bd6/crawler-cli/src/main/java/org/gbif/crawler/common/DownloadCrawlConsumer.java#L55

Solution: Implement a new redirectStrategy to support HTTP status 308 in HTTPUtil https://github.com/gbif/gbif-httputils/blob/3af4fc6d6670f2e6507c245a3c11662e6ac04815/src/main/java/org/gbif/utils/HttpUtil.java#L204

or switch to Apache HTTPClient Core 5 which knows about status 308:

https://hc.apache.org/httpcomponents-core-5.0.x/httpcore5/apidocs/org/apache/hc/core5/http/HttpStatus.html