adrianshort / uk_planning_scraper

A Ruby gem to get planning applications data from UK council websites.
GNU Lesser General Public License v3.0
27 stars 19 forks source link

Catch SSL_Connect error #7

Open adrianshort opened 5 years ago

adrianshort commented 5 years ago
Getting: https://www.planningpa.bolton.gov.uk/online-applications-17/search.do?action=advanced
 /app/vendor/ruby-2.3.1/lib/ruby/2.3.0/net/http.rb:933:in `connect_nonblock': SSL_connect returned=1 errno=0 state=unknown state: unknown protocol (OpenSSL::SSL::SSLError)
    from /app/vendor/ruby-2.3.1/lib/ruby/2.3.0/net/http.rb:933:in `connect'
    from /app/vendor/ruby-2.3.1/lib/ruby/2.3.0/net/http.rb:863:in `do_start'
    from /app/vendor/ruby-2.3.1/lib/ruby/2.3.0/net/http.rb:858:in `start'
    from /app/vendor/bundle/ruby/2.3.0/gems/net-http-persistent-3.0.0/lib/net/http/persistent.rb:692:in `start'
    from /app/vendor/bundle/ruby/2.3.0/gems/net-http-persistent-3.0.0/lib/net/http/persistent.rb:622:in `connection_for'
    from /app/vendor/bundle/ruby/2.3.0/gems/net-http-persistent-3.0.0/lib/net/http/persistent.rb:927:in `request'
    from /app/vendor/bundle/ruby/2.3.0/gems/mechanize-2.7.6/lib/mechanize/http/agent.rb:280:in `fetch'
    from /app/vendor/bundle/ruby/2.3.0/gems/mechanize-2.7.6/lib/mechanize.rb:464:in `get'
    from /app/vendor/bundle/ruby/2.3.0/bundler/gems/uk_planning_scraper-8d15678700bb/lib/uk_planning_scraper/idox.rb:13:in `scrape_idox'
    from /app/vendor/bundle/ruby/2.3.0/bundler/gems/uk_planning_scraper-8d15678700bb/lib/uk_planning_scraper/authority.rb:43:in `scrape'
    from scraper.rb:9:in `block in <main>'
    from scraper.rb:6:in `each'
    from scraper.rb:6:in `each_with_index'
    from scraper.rb:6:in `<main>'
KeithP commented 5 years ago

Maybe try this when intialising Mechanize:

instead of just agent = Mechanize.new

try agent = Mechanize.new{ |a| a.ssl_version, a.verify_mode = 'TLSv1', OpenSSL::SSL::VERIFY_NONE }

adrianshort commented 5 years ago

Locally, this works for Bolton but then breaks for Sutton:

OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=unknown state: tlsv1 alert protocol version
adrianshort commented 5 years ago

https://help.morph.io/t/cannot-scrape-https-site-ssl-error/505/2

adrianshort commented 5 years ago

I've got the same problem (transiently) with Bury, running locally:

Getting: https://planning.bury.gov.uk/online-applications/search.do?action=advanced
../.rvm/rubies/ruby-2.3.0/lib/ruby/2.3.0/net/http.rb:933:in `connect_nonblock': SSL_connect returned=1 errno=0 state=error: certificate verify failed (OpenSSL::SSL::SSLError)
KeithP commented 3 years ago

In the last couple of days Sutton has started returning an SSL error: OpenSSL::SSL::SSLError (SSL_connect returned=1 errno=0 state=error: certificate verify failed (unable to get local issuer certificate))

It appears the target server is misconfigured. It is missing chain certificates, as can be seen by running this: https://www.ssllabs.com/ssltest/analyze.html?d=planningregister.sutton.gov.uk

This server's certificate chain is incomplete. Grade capped to B.

Not sure how to resolve this. Came to a dead end trying to add the missing cert to the local CA store, and anyway doing so feels dirty.