Closed wowasa closed 1 year ago
If I'm not mistaken, SSLHandshakeExceptions are usually caused by a missing certificate. As far as I know, Java uses its own keystore for the certificates so it all depends on how up to date that is. I'm not aware of any method to keep it up to date with 'the browser' - keeping in mind that all browsers won't necessarily behave the same way either. I think that there are two things we can do:
Note: in general it's worth doing a manual check via https://www.ssllabs.com/ssltest if such an issue is encountered. In this case (report) there are chain issues with the certificate so it's not bad to highlight that somehow.
It also comes down to the fundamental question of what our 'benchmark' is. Is a link 'not broken' IFF it works in a browser? Would be good to make this explicit to ourselves and the users.
One more comment: for documentation purposes maybe you could attach a few examples of exceptions with some level of detail. Usually a cause is described in the exception, which would help understand the underlying issue.
Storm Crawler has a property http.trust.everything=true by default. As I understand it we shouldn't see this kind of exception
Storm Crawler has a property http.trust.everything=true by default. As I understand it we shouldn't see this kind of exception
interesting. Can you post a full stack trace?
PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
that confirms the certification path issue. A full stack trace would be helpful to understand where the problem originates
javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:371) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:314) at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:309) at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.checkServerCerts(CertificateMessage.java:654) at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.onCertificate(CertificateMessage.java:473) at java.base/sun.security.ssl.CertificateMessage$T12CertificateConsumer.consume(CertificateMessage.java:369) at java.base/sun.security.ssl.SSLHandshake.consume(SSLHandshake.java:396) at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:480) at java.base/sun.security.ssl.HandshakeContext.dispatch(HandshakeContext.java:458) at java.base/sun.security.ssl.TransportContext.dispatch(TransportContext.java:201) at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:172) at java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1505) at java.base/sun.security.ssl.SSLSocketImpl.readHandshakeRecord(SSLSocketImpl.java:1420) at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:455) at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:426) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:436) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:384) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140) at com.digitalpebble.stormcrawler.protocol.httpclient.HttpProtocol.getProtocolOutput(HttpProtocol.java:245) at eu.clarin.linkchecker.bolt.MetricsFetcherBolt$FetcherThread.run(MetricsFetcherBolt.java:579)
Looks like support for that property http.trust.everything
is implemented for OkHTTP:
But not for Apache HTTP client.
See search results
Decision (from minutes doc):
one addition: we decided to test com.digitalpebble.stormcrawler.protocol.okhttp.HttpProtocol which ignores SSL issues. If it doesn't have in side effects on link checking (f.e. on returned metada) , it is only a configuration issue
implemented in v. 3.0.4
The status stable shows SSLHandshakeExceptions which don't occur in browser requests (f.e. for
https://dspace-clarin-it.ilc.cnr.it/repository/xmlui/bitstream/handle/20.500.11752/OPEN-531/derivational_db.zip?sequence=1
orhttps://dspace-clarin-it.ilc.cnr.it/repository/xmlui/bitstream/handle/20.500.11752/OPEN-530/IT-TB_PML_analytical-tectogrammatical.zip?sequence=1
).Hence the question is how we can avoid these exceptions to get a similar behavior as in the browser.