code4lib / ruby-oai

a Ruby library for building OAI-PMH clients and servers
MIT License
62 stars 43 forks source link

Client doesn't respect 503 with Retry-After #45

Closed dmolesUC closed 2 years ago

dmolesUC commented 9 years ago

I first noticed this when attempting to retrive several thousand records from arXiv (which imposes a 20-second delay) with .full and resumption tokens, but you can see the same thing if you just try to retrive one page of records twice in quick succession.

Steps to reproduce:

    require 'oai'
    client = OAI::Client.new 'http://export.arxiv.org/oai2'
    opts = {from:'2012-03-01', until:'2012-04-01', metadata_prefix:'arXiv'}
    puts client.list_records(opts).full.count

Expected:

6603   # after about 2 minutes

Actual:

1000   # immediately

I tried monkey-patching OAI::Client#get to print some debug info:

class OAI::Client
  def get(uri)
    print "getting #{uri}... "
    response = @http_client.get uri
    puts response.status
    puts "     #{response.headers}"
    response.body
  end
end

and reran the second case above, confirming that we're getting a 503 with retry-after as per the OAI spec.

getting http://export.arxiv.org/oai2?verb=ListRecords&from=2012-03-01&until=2012-04-01&metadataPrefix=arXiv... 200
     {"date"=>"Thu, 09 Apr 2015 00:07:19 GMT", "server"=>"Apache", "vary"=>"Accept-Encoding,User-Agent", "connection"=>"close", "transfer-encoding"=>"chunked", "content-type"=>"text/xml"}
getting http://export.arxiv.org/oai2?verb=ListRecords&resumptionToken=798053%7C1001... 503
     {"date"=>"Thu, 09 Apr 2015 00:08:06 GMT", "server"=>"Apache", "retry-after"=>"20", "vary"=>"Accept-Encoding,User-Agent", "content-length"=>"72", "connection"=>"close", "content-type"=>"text/html"}
1000

It looks like there's a related pull request #29, although it would be nice if both the number of retries and max retry wait time were configurable.

dmolesUC commented 5 years ago

It looks like recent versions of Faraday support the Retry-After header, and the following workaround seems to do it:

require 'oai'
require 'faraday_middleware'
http_client = Faraday.new do |conn|
    conn.request(:retry, max: 5, retry_statuses: 503)
    conn.response(:follow_redirects, limit: 5)
    conn.adapter :net_http
end
client = OAI::Client.new(base_url, http: http_client)
opts = {from:'2012-03-01', until:'2012-04-01', metadata_prefix:'arXiv'}
puts client.list_records(opts).full.count

I haven't tested whether this behaves correctly in the event of a 503 without Retry-After, however.

barmintor commented 2 years ago

The challenge here is that this gem is not very opinionated about faraday versions, which means it can't reliably spec a dependency on faraday-middleware or faraday-retry. Hmm.

barmintor commented 2 years ago

Thank you for documenting an approach from a client application's perspective, by the way! this might be the best resolution.

barmintor commented 2 years ago

@dmolesUC I've added your example to the README (thanks!); I don't think we can do more than that until/unless we specify faraday versions more narrowly.