appmattus / certificatetransparency

Certificate transparency for Android and JVM
Apache License 2.0
139 stars 29 forks source link

Unable to load log servers with log-list.json #48

Open HappyDr0id opened 1 year ago

HappyDr0id commented 1 year ago

Hey 👋

We did the migration from 0.3.0 to 1.1.1 recently and we observed some non-fatal reports on Crashlytics since then. By digging into the issue, it looks like there is an internal library error happening quite randomly, only from time to time (like once every few minutes), without preventing the usage of our app overall (at least visibly for the user). Here is the stack trace:

2022-10-04 17:33:08.402 30118-30253/<package_name> I/CertificateTransparency: <domain_url> Failure: Unable to load log servers with log-list.json failed to load with java.lang.InterruptedException
        at kotlinx.coroutines.BlockingCoroutine.joinBlocking(Builders.kt:84)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking(Builders.kt:59)
        at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source:1)
        at kotlinx.coroutines.BuildersKt__BuildersKt.runBlocking$default(Builders.kt:38)
        at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source:1)
        at com.appmattus.certificatetransparency.internal.verifier.CertificateTransparencyBase.hasValidSignedCertificateTimestamp(CertificateTransparencyBase.kt:112)
        at com.appmattus.certificatetransparency.internal.verifier.CertificateTransparencyBase.verifyCertificateTransparency(CertificateTransparencyBase.kt:96)
        at com.appmattus.certificatetransparency.internal.verifier.CertificateTransparencyInterceptor.intercept(CertificateTransparencyInterceptor.kt:69)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:34)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:95)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:83)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:76)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:109)
        <...>
2022-10-04 17:33:08.402 30118-30253/<package_name> I/CertificateTransparency:     at java.lang.Thread.run(Thread.java:1012)
2022-10-04 17:33:08.404 30118-30253/<package_name> I/okhttp.OkHttpClient: <-- HTTP FAILED: javax.net.ssl.SSLPeerUnverifiedException: Certificate transparency failed

We didn't have this report at all with the previous 0.3.0 version, and this failure is happening really randomly, among successful "SCT trusted logs". Our domains have needed certificates and Certificate Transparency is working as expected, so, especially seeing the json file reading failure, it looks to be more internal to the library than due to a certificate failure itself.

Does it ring a bell to you? Thanks in advance for your help, and overall for your work on this CT library 🙏

mattmook commented 1 year ago

Nothing in particular comes to mind - InterruptedException kind of implies the the coroutine block has been cancelled in some way - could be timeout? I might need to add some more tests around timeouts etc to see if I can replicate more reliably. The fix may be as simple as the code detecting this exception and suppressing (if appropriate)

mattmook commented 1 year ago

One thing to look at is what you do in terms of caching the log list data (to disk for example) that could reduce the number of network hits

mattmook commented 1 year ago

On reflection I think this is likely solved by #39 and given no other reports of a similar issue I'm going to close this one.

HappyDr0id commented 1 year ago

Thanks for your answers and your work on this library 🙏 I'll see with the latest release if this is something still being reported in production environment or not 🙂

ikeed commented 1 year ago

The fix may be as simple as the code detecting this exception and suppressing (if appropriate)

Unfortunately in the case of the InterruptedException, the lib is throwing an SSLPeerUnverifiedException which is indistinguishable from a validation failure. In our app, I implemented a kind of kludgy workaround to look at the VerificationResult from the last logger callback to determine whether we should suppress the error. It would be great if there were an easier way to distinguish these cases.

I can still reproduce this issue on 2.1.2 by spamming a bunch of requests at once.

mattmook commented 1 year ago

Have reopened for further investigation. @ikeed, this definitely sounds testable/reproducible by what you are saying then.

ikeed commented 1 year ago

Thanks very much Matt! If you need any more info from me, just ask.

ikeed commented 1 year ago

Incidentally, if the InterruptedException is too complex to fix, just embedding some kind of a "reason" field into the SSLPeerUnverifiedException would suit my needs. I'm not sure if that's the direction you want to go. We get a VerificationResult in the log callback and it would be useful to have something similar in the SSLPeerUnverifiedException itself so clients can make smarter decisions when an error is thrown. Just to help distinguish between concurrency issues and an actual dodgy cert.

dsatija commented 1 month ago

Hi @mattmook , we are facing the same issue. Getting this error : Unable to load log servers with log-list.zip failed to load with kotlinx.coroutines.JobCancellationException Here's the error : E/CertificateTransparencyManager: Certificate transparency failed for host : y.x.com with result: Failure: Unable to load log servers with log-list.zip failed to load with kotlinx.coroutines.JobCancellationException: Parent job is Cancelling; job=BlockingCoroutine{Cancelled}@5bb23e7 Caused by: java.lang.InterruptedException at kotlinx.coroutines.BuildersKt.runBlocking(Unknown Source:159) at kotlinx.coroutines.BuildersKt.runBlocking$default(Unknown Source:3) at com.appmattus.certificatetransparency.internal.verifier.CertificateTransparencyBase.hasValidSignedCertificateTimestamp(CertificateTransparencyBase.kt:7) at com.appmattus.certificatetransparency.internal.verifier.CertificateTransparencyBase.verifyCertificateTransparency(CertificateTransparencyBase.kt:77) at

Could you suggest what can be done here ?