Azure / azure-storage-android

Microsoft Azure Storage Library for Android
Apache License 2.0
81 stars 47 forks source link

Threads Locked in Socket IO Library Version 0.5.1 #14

Closed dhakimTRX closed 9 years ago

dhakimTRX commented 9 years ago

I've noticed an issue in shutting down my app where threads running various queries through this library get stuck for upwards of 10 minutes. This appears consistent with some path through the library in which socket timeouts are not properly set.

The issue can be duplicated by running a query on a phone with no SIM card, with very poor, but still connected, WiFi. My test involves achieving this poor connectivity by either walking to the end of the hallway, or hanging out in the elevator 10 floors above my office, starting a ton of queries, then shutting down my application. This behavior cannot be duplicated by either disabling WiFi on the device or completely walking out of WiFi range.

The threads in question will not exit until A: user toggles WiFi on the device, B: user walks back into WiFi range, or C: some extremely long default timeout triggers (>10 minutes)

The locked up query threads in question are running CloudTable.execute on various simple queries against my table storage account. TableRequestOptions are all default, except for setting a maximum execution time in milliseconds of 10000. On all CloudTable instances, I have run cloudTable.getServiceClient().getDefaultRequestOptions().setRetryPolicyFactory(new RetryNoRetry()).

It is rather difficult to pull full stack traces, as the bug has generally resolved itself by the time I can return to my desk, as I am thus back within WiFi range. I have managed to pull one such stack trace however:

at com.android.org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method) at com.android.org.conscrypt.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:320) at com.android.okhttp.Connection.upgradeToTls(Connection.java:1285)
at com.android.okhttp.Connection.connect(Connection.java:1197)
at com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392)
at com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295)
at com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) at com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) at com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) at com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105)
at com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25)
at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:123)
at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109)
The above stack trace may not be exactly with the 0.5.1 release, however I can state that the issue in my app remains. I have built some tools to show the states of each thread in this frozen state. Locked up threads are generally in one of:

A. libcore.io.Posix.connect(Native Method), B. com.android.org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method), or C. com.android.org.conscrypt.NativeCrypto.SSL_read(Native Method)

All threads in question have been interrupted, and their interrupted flag is set at the time of determining their stack traces.

I am currently hoping that I have missed some simple setting in the client application, (There's no handy dandy kill all open connections method is there?) or that the library has missed some socket timeout, as I truly do not want to track this to issues in Posix or NativeCrypto.

Please let me know if there is any more information you would need to track this down, and thank you for providing the library that makes my application possible,

-Dan

emgerner-msft commented 9 years ago

Thanks for the very detailed description - it's helpful! Specifying the max execution time is the correct thing to be doing.

As of version 5.1 we made a bunch of fixes to how we handle socket timeouts. So, for pre-5.1 I'd blame these bugs. That being said, you mentioned that you're still seeing issues at this point using 5.1 so I'm a little less sure what could be wrong.

I know you said it's hard to get logs given how you reproduce the issue, but would it be possible to get a stack trace that's definitely from the 5.1 version? I'll try to reproduce on my end as well but I have unfortunately good cell signal. :) This step will eliminate issues caused by the previous bug so I can track down whatever is happening now.

Could you also tell me which version of Android you see this issue on and/or if there a versions you don't see the issue on? This will help me track down if there are any changes or bugs in the Android framework which could affect this issue.

Thanks, Emily

dhakimTRX commented 9 years ago

Android phone is a GT-I9295 (Samsung Galaxy S4 Active) running Android 5.0.1.
Azure-Storage-Android library is 0.5.1

It took quite a while to build proper tools to debug this. I am now timing each query, catching and printing the stack traces of each failed query, and pulling the stack trace of all threads every X seconds. As before, my test is to walk to the end of the hallway, drop down to a single bar of WiFi, trigger a large number of queries, and then time the queries.

Please excuse the wall of text below.

The first tool times each query, and prints the retry policy and max time allowed to ensure I'm passing correct data. Here are my timings from one run.
+++++++++++++++START ENTRY+++++++++++++ Thread ID: 3062 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 10684 +++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 3071 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 13203 +++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 3073 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 15985 +++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 3071 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 30040 +++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 3075 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 30028 +++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 3062 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 40163 +++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 3072 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 27219 +++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 3079 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 23808 +++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 3073 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 28826 +++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 3063 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 63181 +++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 3072 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@1a6bf6a6 Total Time Elapsed : 14353 +++++++++++++++ END ENTRY+++++++++++++

As of 0.5.1, and a number of fixes in my side, I am no longer able to duplicate the 10+ minute timings. I am however able to regularly see times of 20 to 75 seconds. I have noticed some patterns in the timings, with most threads dropping out right around the timeout (due to proper timeout exceptions), but large numbers also dropping out around 30 seconds and around 60 seconds. That makes me suspect at least two failure cases at different points in the connection.

I built a tool to regularly log stack traces of each thread.

Here are traces from another run in which thread 3353 took 73 seconds to terminate.

Overview: ++++++++++++++++++++++++++++START ENTRY+++++++++++++ Thread ID: 3353 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@3f6680a3 Total Time Elapsed : 73175 +++++++++++++++ END ENTRY+++++++++++++

At 30 seconds: Maps, (Maps) pool-3-thread-2, 3353, Querying ResultsRefreshBuildings, Tile: 0, RUNNABLE, true, true, libcore.io.Posix.connect(Native Method) libcore.io.BlockGuardOs.connect(BlockGuardOs.java:111) libcore.io.IoBridge.connectErrno(IoBridge.java:137) libcore.io.IoBridge.connect(IoBridge.java:122) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:183) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:456) java.net.Socket.connect(Socket.java:882) com.android.okhttp.internal.Platform.connectSocket(Platform.java:139) com.android.okhttp.Connection.connect(Connection.java:1194) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38) java.lang.Thread.run(Thread.java:818)

Again at 65 seconds: Maps, (Maps) pool-3-thread-2, 3353, Querying ResultsRefreshBuildings, Tile: 0, RUNNABLE, true, true, libcore.io.Posix.connect(Native Method) libcore.io.BlockGuardOs.connect(BlockGuardOs.java:111) libcore.io.IoBridge.connectErrno(IoBridge.java:137) libcore.io.IoBridge.connect(IoBridge.java:122) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:183) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:456) java.net.Socket.connect(Socket.java:882) com.android.okhttp.internal.Platform.connectSocket(Platform.java:139) com.android.okhttp.Connection.connect(Connection.java:1194) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38) java.lang.Thread.run(Thread.java:818)

If the thread above is ever leaving the Posix.connect call, it is not staying out long enough for my refresh rate to detect it. There may be retries there, or there may be a single long timeout. The fact that each thread takes a different amount of time to timeout argues against a single long default timeout.

Finally, I did a run where I simply waited for everything to recover, figuring that any timeouts being thrown would show the full stacktraces of where the threads were locked up. I have saved the best for last.

This thread took 23 seconds to exit, but threw no exceptions. +++++++++++++++START ENTRY+++++++++++++ Thread ID: 4501 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@21689597 Total Time Elapsed : 23312 Terminated Without Exception +++++++++++++++ END ENTRY++++++++++++++

This thread took 20 seconds to exit, and threw in SSL_do_handshake.
++++++++++++++START ENTRY+++++++++++++Thread ID: 4502 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@21689597 Total Time Elapsed : 20403 Terminated With Exception: An error occurred while enumerating the result, check the original exception for details.com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: The operation did not complete in the specified time.com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:201) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: SSL handshake timed outcom.android.org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method) com.android.org.conscrypt.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:320) com.android.okhttp.Connection.upgradeToTls(Connection.java:1285) com.android.okhttp.Connection.connect(Connection.java:1197) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38) java.lang.Thread.run(Thread.java:818) -----------------------------------------+++++++++++++++ END ENTRY++++++++++++++

This thread took 73 and threw in Posix.connect +++++++++++++START ENTRY+++++++++++++Thread ID: 4513 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@21689597 Total Time Elapsed : 73198 Terminated With Exception: An error occurred while enumerating the result, check the original exception for details.com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: An unknown failure occurred : failed to connect to trxdev.table.core.windows.net/191.237.32.216 (port 443): connect failed: ETIMEDOUT (Connection timed out)com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:218) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: failed to connect to trxdev.table.core.windows.net/191.237.32.216 (port 443): connect failed: ETIMEDOUT (Connection timed out)libcore.io.IoBridge.connect(IoBridge.java:124) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:183) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:456) java.net.Socket.connect(Socket.java:882) com.android.okhttp.internal.Platform.connectSocket(Platform.java:139) com.android.okhttp.Connection.connect(Connection.java:1194) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: connect failed: ETIMEDOUT (Connection timed out)libcore.io.Posix.connect(Native Method) libcore.io.BlockGuardOs.connect(BlockGuardOs.java:111) libcore.io.IoBridge.connectErrno(IoBridge.java:137) libcore.io.IoBridge.connect(IoBridge.java:122) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:183) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:456) java.net.Socket.connect(Socket.java:882) com.android.okhttp.internal.Platform.connectSocket(Platform.java:139) com.android.okhttp.Connection.connect(Connection.java:1194) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

+++++++++++++++ END ENTRY+++++++++++++

dhakimTRX commented 9 years ago

Just duplicated the results on a SM-G900H (Galaxy S5) running Android 5.0

+++++++++++++START ENTRY+++++++++++++Thread ID: 22267 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : com.microsoft.azure.storage.RetryNoRetry@3ea737b4 Total Time Elapsed : 54297 Terminated With Exception: An error occurred while enumerating the result, check the original exception for details.com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: The operation did not complete in the specified time.com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:201) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: SSL handshake timed outcom.android.org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method) com.android.org.conscrypt.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:320) com.android.okhttp.Connection.upgradeToTls(Connection.java:1252) com.android.okhttp.Connection.connect(Connection.java:1183) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:161) com.trx.azure.AzureSession.QueryNew(AzureSession.java:314) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38) java.lang.Thread.run(Thread.java:818) -----------------------------------------+++++++++++++++ END ENTRY++++++++++++++

emgerner-msft commented 9 years ago

Looked at a bunch of these stack traces and results. From some reading, the ETIMEDOUT errors I think might be something we can mitigate in the library, and that looks like it would solve a lot of these. The libcore.io.Posix.connect(Native Method) issue seems less common and I haven't been able to reliably reproduce. Still working on that.

For now I'm going to push the source for a potential mitigation to my personal fork in the master branch that you can grab if you want to try it out. In theory this might help with the ETIMEDOUT issues. In practice I don't have the level of tooling set up that you do yet, but from some basic testing in my parking garage it looks like it possibly helps. Note that this is strictly just for testing! If you have a chance to try this out on your side and report any changes back, that would be great.

dhakimTRX commented 9 years ago

Running your personal fork, I'm seeing a lot more threads exit at exactly 10 seconds. Here is a run from this morning. The 29 second and 16 second threads seem to both have gotten stuck in com.android.org.conscrypt.NativeCrypto.SSL_do_handshake. Maybe the handshake has enough steps that individually took a long time (but less than the read timeout) but still succeeded?

+++++++++++++++START ENTRY+++++++++++++ Thread ID: 1438 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : null Total Time Elapsed : 10023 Terminated With Exception: An error occurred while enumerating the result, check the original exception for details. com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: The operation did not complete in the specified time. com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:201) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: failed to connect to trxdev.table.core.windows.net/191.237.32.216 (port 443) after 9996ms libcore.io.IoBridge.connectErrno(IoBridge.java:169) libcore.io.IoBridge.connect(IoBridge.java:122) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:183) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:456) java.net.Socket.connect(Socket.java:882) com.android.okhttp.internal.Platform.connectSocket(Platform.java:139) com.android.okhttp.Connection.connect(Connection.java:1194) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

+++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 1436 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : null Total Time Elapsed : 10013 Terminated With Exception: An error occurred while enumerating the result, check the original exception for details. com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: The operation did not complete in the specified time. com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:201) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: failed to connect to trxdev.table.core.windows.net/191.237.32.216 (port 443) after 9986ms libcore.io.IoBridge.connectErrno(IoBridge.java:169) libcore.io.IoBridge.connect(IoBridge.java:122) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:183) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:456) java.net.Socket.connect(Socket.java:882) com.android.okhttp.internal.Platform.connectSocket(Platform.java:139) com.android.okhttp.Connection.connect(Connection.java:1194) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

+++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 1439 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : null Total Time Elapsed : 10016 Terminated With Exception: An error occurred while enumerating the result, check the original exception for details. com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: The operation did not complete in the specified time. com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:201) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: failed to connect to trxdev.table.core.windows.net/191.237.32.216 (port 443) after 9997ms libcore.io.IoBridge.connectErrno(IoBridge.java:169) libcore.io.IoBridge.connect(IoBridge.java:122) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:183) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:456) java.net.Socket.connect(Socket.java:882) com.android.okhttp.internal.Platform.connectSocket(Platform.java:139) com.android.okhttp.Connection.connect(Connection.java:1194) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

+++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 1433 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : null Total Time Elapsed : 10032 Terminated With Exception: An error occurred while enumerating the result, check the original exception for details. com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: The operation did not complete in the specified time. com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:201) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: failed to connect to trxdev.table.core.windows.net/191.237.32.216 (port 443) after 9998ms libcore.io.IoBridge.connectErrno(IoBridge.java:169) libcore.io.IoBridge.connect(IoBridge.java:122) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:183) java.net.PlainSocketImpl.connect(PlainSocketImpl.java:456) java.net.Socket.connect(Socket.java:882) com.android.okhttp.internal.Platform.connectSocket(Platform.java:139) com.android.okhttp.Connection.connect(Connection.java:1194) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.FeatureManager.RefreshFeaturesInAreaSync(FeatureManager.java:79) com.trx.azure.sync.manager.FeatureManager.access$0(FeatureManager.java:62) com.trx.azure.sync.manager.FeatureManager$1.run(FeatureManager.java:57) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

+++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 1431 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : null Total Time Elapsed : 29214 Terminated With Exception: An error occurred while enumerating the result, check the original exception for details. com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: The operation did not complete in the specified time. com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:201) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: SSL handshake timed out com.android.org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method) com.android.org.conscrypt.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:320) com.android.okhttp.Connection.upgradeToTls(Connection.java:1285) com.android.okhttp.Connection.connect(Connection.java:1197) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

+++++++++++++++ END ENTRY+++++++++++++ +++++++++++++++START ENTRY+++++++++++++ Thread ID: 1436 Max Time Allowed: 10000 Retry Policy (Default): com.microsoft.azure.storage.RetryNoRetry Retry Policy (Passed) : null Total Time Elapsed : 16314 Terminated With Exception: An error occurred while enumerating the result, check the original exception for details. com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: The operation did not complete in the specified time. com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:201) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

Caused By: SSL handshake timed out com.android.org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method) com.android.org.conscrypt.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:320) com.android.okhttp.Connection.upgradeToTls(Connection.java:1285) com.android.okhttp.Connection.connect(Connection.java:1197) com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:392) com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:295) com.android.okhttp.internal.http.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:373) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:323) com.android.okhttp.internal.http.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:491) com.android.okhttp.internal.http.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105) com.android.okhttp.internal.http.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25) com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:125) com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) com.trx.azure.AzureSession.enumerateIterable(AzureSession.java:158) com.trx.azure.AzureSession.QueryNew(AzureSession.java:311) com.trx.azure.sync.pull.PullBase.pull(PullBase.java:39) com.trx.azure.sync.manager.MapManager.RefreshBuildingsInAreaSync(MapManager.java:128) com.trx.azure.sync.manager.MapManager.access$0(MapManager.java:112) com.trx.azure.sync.manager.MapManager$1.run(MapManager.java:62) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1112) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:587) com.trx.common.concurrent.NamedThreadFactory$1.run(NamedThreadFactory.java:38)

java.lang.Thread.run(Thread.java:818)

+++++++++++++++ END ENTRY+++++++++++++

emgerner-msft commented 9 years ago

Alright, between the library mitigation and fixes on your end in your app, we're down from ~10 min to ~30 sec. Not too bad of progress in about a day or two. :)

The remaining threads that are taking a bit long to timeout are probably failing for a couple reasons, though these are somewhat hard to debug. Your theory above seems reasonable for the SSL handshake. Another possibility is that we're losing connectivity later but the thread is still hanging as it continues to wait for data. Unfortunately for both theories the issue and fix are not easy but more fundamental difficulties internal to Android.

We've got a couple ideas on how to help with these issues client side, but those fixes are going to take a while to implement and test. To give a brief overview of an approach we took in another library, one option is to basically set a timer on another thread and abort the connection when that timer hits the max execution time. We are currently investigating whether this approach is feasible in Android and how it might be applied.

In summary, the action plan for now will be that we will release the patch to mitigate this issue in our upcoming release. If you need to override for now, you can use the OperationContext route with the 0.5.1 library. A more general fix to cut off long running network ops will be in the works (and we'll update the thread at that point), but we can't give a timeline for that at the moment.

Best, Emily

dhakimTRX commented 9 years ago

That sounds great, thanks for looking into this with me. I've used that kind of socket watchdog thread before, its a great solution, albeit a little heavyweight on the number of threads required for every connection.

The fixes you've already put in should do a lot to mitigate the time the app goes without an azure connection when walking between networks due to bottlenecking my thread pools. If I'm allowed to make a feature request, I would ask for some kind of reset or shutdown function that quickly kills active connections and returns the library to its initial state, that would allow for a simpler implementation on my end when exiting my user interface.

In the meantime, I don't think I'll have a problem working around the times I'm seeing now, a 10 second splash screen is much better than a 10 minute one.

Thanks a lot for your time, Dan

emgerner-msft commented 9 years ago

You're welcome to make feature requests. I'll log what you've written above internally. If you want to open a new issue for that with the Feature tag we'd also be happy to track this externally and notify you of any progress. This would also give others who have that request the ability to comment with any additional requirements they might have.