LAW-Unimi / BUbiNG

The LAW next generation crawler.
http://law.di.unimi.it/software.php#bubing
Apache License 2.0
85 stars 24 forks source link

SSL Certificate are wrongly rejected #5

Closed guillaumepitel closed 7 years ago

guillaumepitel commented 7 years ago

I have this error regarding SSL Certificates, that occurs very frequently :

javax.net.ssl.SSLPeerUnverifiedException: Certificate for <www.genopole.fr> doesn't match any of the subject alternative names: [join-the-biocluster.genopole.fr, jointhebiocluster.genopole.fr, join.genopole.fr] at org.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:467) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:397) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) at org.apache.http.impl.conn.BasicHttpClientConnectionManager.connect(BasicHttpClientConnectionManager.java:323) at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111) at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:191) at it.unimi.di.law.bubing.util.FetchData.fetch(FetchData.java:323) at it.unimi.di.law.bubing.frontier.FetchingThread.run(FetchingThread.java:253) The VisitState is subsequently Killed.

When I look at the site using my browser, it doesn't seem to complain, though. A lot of sites are affected by this problem.

vigna commented 7 years ago

Mmmh. HTTPS has continuously this kind of problems—the library is set to very tight security, while browsers are, as always, forgiving. I guess there's an HTTPClient parameter to disable this check...

guillaumepitel commented 7 years ago

So after digging around, it may be actually related to Java version and LetsEncrypt certs : https://stackoverflow.com/questions/34110426/does-java-support-lets-encrypt-certificates

Except that the affected sites do not seem to be using these certificates. Anyway I think it would be a good idea to find a setting that disables SSL verification.

guillaumepitel commented 7 years ago

More tips here : protected static final class BasicHttpClientConnectionManagerWithAlternateDNS extends BasicHttpClientConnectionManager { static Registry<ConnectionSocketFactory> getDefaultRegistry() { return RegistryBuilder.<ConnectionSocketFactory> create() .register("http", PlainConnectionSocketFactory.getSocketFactory()) .register("https", new SSLConnectionSocketFactory(SSLContexts.createSystemDefault(), new String[] { "TLSv1.2", "TLSv1.1", "TLSv1", "SSLv3", "SSLv2Hello", }, null, SSLConnectionSocketFactory.getDefaultHostnameVerifier())) .build(); }

public BasicHttpClientConnectionManagerWithAlternateDNS(final DnsResolver dnsResolver) { super(getDefaultRegistry(), null, null, dnsResolver); } }

I think the problem is with the getDefaultHostnameVerifier : it probably is the Strict Version : https://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/conn/ssl/

Probably should take the NoopHostnameVerifier or at least the BrowserCompatHostnameVerifier

guillaumepitel commented 7 years ago

Here is a code that seems to work (in FetchingThread) :

/** An SSL context that accepts all certificates */
        private static final SSLContext TRUST_ALL_CERTIFICATES_SSL_CONTEXT;
        static {
                try {
                        TRUST_ALL_CERTIFICATES_SSL_CONTEXT = SSLContexts.custom().loadTrustMaterial(null, new TrustStrategy() {
                                        public boolean isTrusted(X509Certificate[] arg0, String arg1) throws CertificateException {
                                                return true;
                                        }}).build();
                }
                catch (Exception cantHappen) {
                        throw new RuntimeException(cantHappen.getMessage(), cantHappen);
                }
        }

        /** A support class that makes it possible to plug in a custom DNS resolver. */
        protected static final class BasicHttpClientConnectionManagerWithAlternateDNS
                        extends BasicHttpClientConnectionManager {

                static Registry<ConnectionSocketFactory> getDefaultRegistry() {
                        // setup a Trust Strategy that allows all certificates.
                        //
                        SSLContext sslContext = TRUST_ALL_CERTIFICATES_SSL_CONTEXT;
                        return RegistryBuilder.<ConnectionSocketFactory> create()
                                        .register("http", PlainConnectionSocketFactory.getSocketFactory())
                                        .register("https",
                                                        new SSLConnectionSocketFactory(sslContext,
                                                                        new String[] {
                                                                                        "TLSv1.2",
                                                                                        "TLSv1.1",
                                                                                        "TLSv1",
                                                                                        "SSLv3",
                                                                                        "SSLv2Hello",
                                                                        }, null, new NoopHostnameVerifier()))
                                        .build();
                }

                public BasicHttpClientConnectionManagerWithAlternateDNS(final DnsResolver dnsResolver) {
                        super(getDefaultRegistry(), null, null, dnsResolver);
                }
        }
vigna commented 7 years ago

OK. I think we should add a parameter here—there are possibly security risks involved. But I agree, the more we are compatible, the better.

guillaumepitel commented 7 years ago

Agreed, however, allowing for self-signed certificates is in itself sufficient for totally blowing up SSL security, unless you have pinned the certificates beforehand.

guillaumepitel commented 7 years ago

Another problem occurs but it's probably related to java bugs :

2017-10-02 15:30:27,602 19221 WARN [ParsingThread-15] i.u.d.l.b.f.ParsingThread - Exception while fetching https://www.teamschramm.com/robots.txt
javax.net.ssl.SSLException: Received fatal alert: internal_error
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
        at sun.security.ssl.Alerts.getSSLException(Alerts.java:154)
        at sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2033)
        at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1135)
        at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
        at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:396)
        at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)
        at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
        at org.apache.http.impl.conn.BasicHttpClientConnectionManager.connect(BasicHttpClientConnectionManager.java:323)
        at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:191)
        at it.unimi.di.law.bubing.util.FetchData.fetch(FetchData.java:325)
        at it.unimi.di.law.bubing.frontier.FetchingThread.run(FetchingThread.java:272)
vigna commented 7 years ago

I see. Did you check whether

https://www.superprof.fr/cours/toute-matiere/marseille/

does work when addressed directly (i.e., not through redirects)?

vigna commented 7 years ago

The problem is the inherited methods are logged with the base class. Try this:

public class Test {
    public static class A {
        void a() {
            throw new RuntimeException();
        }
    }

    public static class B extends A {}
    public static void main(String a[]) {
        new B().a();
    }
}
guillaumepitel commented 7 years ago

Hi, yes I checked that's why I removed my comment, the problem is not that the ConnectionManager is ignored, just that this particular SSL connection fails, wether it's from a redirect or not. It's probably a java bug, I'll have to dig deeper to correct this.

guillaumepitel commented 7 years ago

So, after a bit more digging, it seems that the problem is that, for some reason, the SSL layer tries to initiate a SSLv2 handshake with some sites, and they reply harshly. Removing SSLv2 from the list of supported protocols in the ConnectionManager constructor alleviates the problem (there are still some sites in error though). However I guess that this protocol was added to support some sites ?

As a consequence, the only way, in my opinion, to deal with the problem would be to have several connection managers with different supported protocols, catching SSL exceptions and retrying with another Cx Manager when they happen.

vigna commented 7 years ago

Well, I think I copied that list of protocols somewhere, to include all protocols. We might make that list an optional parameter, too. But how many sites would be affected by this?

guillaumepitel commented 7 years ago

Maybe a hint in this blog post : https://jve.linuxwall.info/blog/index.php?post/TLS_Survey

It's not recent but probably interesting though.

vigna commented 7 years ago

Default is now to accept all certificates; an option brings back the previous behaviour.