Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 68 forks source link

One more SSL related issue #196

Closed AntonioAmore closed 8 years ago

AntonioAmore commented 8 years ago

I got an issue, using 2.4.0 snapshot:

Caused by: javax.net.ssl.SSLPeerUnverifiedException: Host name 'www.aflac.com' does not match the certificate subject provided by the peer (CN=incapsula.com, O=Incapsula Inc, L=Dover, ST=Delaware, C=US)
    at org.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:465)

I believe it is because invalid certificate, but it is possible there are a lot of such sites, so solution may be helpful to community.

essiembre commented 8 years ago

Setting trustAllSSLCertificates to true does not do it for you? Like:

<httpClientFactory>
    ...
    <trustAllSSLCertificates>true</trustAllSSLCertificates>
</httpClientFactory>

I thought we recently resolved the last issues with SSL certificates in #181?

AntonioAmore commented 8 years ago

Seems the solution doesn't plays for another site, and the exception a bit differs, on my opinion.

essiembre commented 8 years ago

Try the latest snapshot release. When trusting all certificates, host name verification is now disabled. This eliminates the error in my testing. Please confirm.

AntonioAmore commented 8 years ago

The newest snapshot works perfectly! (tested with tika extractor) Thank you a lot.