Closed jasaw closed 4 months ago
needs more comments
we already have this
private final Map<String, BluetoothGatt> mConnectedDevices = new ConcurrentHashMap<>();
private final Map<String, BluetoothGatt> mCurrentlyConnectingDevices = new ConcurrentHashMap<>();
I dont think we need another map. We can just check !mCurrentlyConnectingDevices
&& !mAutoConnect
@chipweinberger Thank you for taking time to review this. Your comments are very helpful.
why not just have the 2 second delay?
Just for clarification (for other developers as well), a 2 second delay won't solve this race condition. This is the sequence of events of the race condition with 2s delay or any delay:
connectGatt
.disconnect
then close
. While calling disconnect
, the Android Bluetooth stack is in the middle of establishing Bluetooth connection. FBP thinks that connect call has been cancelled, so notifies the app that it's disconnected.I dont think we need another map. We can just check
!mCurrentlyConnectingDevices && !mAutoConnect
Thank you for your input. That's very helpful. I've updated the code as suggested. This should also address the autoconnect scenario right?
your update looks better.
i'm still trying to think if this is the best solution
in the dart code, there are already mutexes to help connect and disconnect interact properly
i assume this issue only happens with device.disconnect(queue: false), right?
also trying to think if this issue would happen on other platforms, and would then make more sense in Dart
and lastly thinking about if there are still race conditions if connect and disconnect are called a lot
i.e. does your fix introduce other race conditions
also, this is going to result in device.connectionState to be called many times (1. disconnect, 2. connect, 3. disconnect)
please think more about these issues
i assume this issue only happens with
device.disconnect(queue: false)
, right?
Yes, you are absolutely right.
also trying to think if this issue would happen on other platforms
That is a good question. I have been focusing on Android so far because it is way more flaky than iOS. It would be good to do some extensive testing on iOS platform as well.
also, this is going to result in device.connectionState to be called many times (1. disconnect, 2. connect, 3. disconnect)
Yes, I have thought about this issue and thought it's safer to notify the app about the connection state rather than hiding it. If the app is listening to the connection state change event, the app can call disconnect if it wants to (it's safe to call disconnect multiple times).
if connect and disconnect are called a lot
This is just a brain dump of what I think will happen when connect and disconnect are called continuously quickly.
Scenario 1:
Scenario 2:
remoteId
back onto connecting map. Immediately after that, Android successfully established the previous connection. All good.Scenario 3:
remoteId
back onto connecting map.Scenario 4:
remoteId
back onto connecting map.Scenario 5:
remoteId
onto connecting map and calls gatt.connect
. Android either schedules the connect or ignores the connect. If Android ignores the connect, FBP connect timeout should catch it. If Android schedules the connect, then we go to step 5.Those are the scenarios I can think of.
appreciate your thoughts
on your analysis, i think there is an assumption that the "handle method call" thread and the gatt connection callback thread are mutually exclusive.
i'm not sure if that is true
we might want to add a mutex so that connect/disconnect is not called while the gatt connection callback is still processing
we can call it the "mConnectionCallbackMutex" or something
also I think we should hide the extra connection/disconnection events
connection event: only invoked if mCurrentlyConnectingDevices or mAutoConnect
disconnect event: only invoked if mConectedDevices
ignore / log other connection & disconnection events
i think this is what we would want
@chipweinberger
assumption that the "handle method call" thread and the gatt connection callback thread are mutually exclusive
I can't find any information on handle-method-call thread and gatt connection callback thread on Android documentation. Maybe I didn't look hard enough. I agree it's safer to add mutex to protect the critical sections. I've added mutex.
also I think we should hide the extra connection/disconnection events
Done.
thanks
Is there a mutex that can't fail? Acquiring a mutex should not fail.
if we are interrupted we should just try again.
perhaps define a new mutex object that does not throw.
i really hate how complicated handling this issue is. such ugly complicated code (not your fault).
also, i prefer not to use try/finally
just release the mutex before every "return"
thanks btw
also i'd make the comment this
"Android has an annoying edge case. If disconnect is called right when the connection is being established, android sometimes ignores the request to disconnect and completes the connection anyway. To handle this case, we make sure the device is still in our currently connecting devices, otherwise kill the connection, since the user was not expecting it to connect"
something like this
also, i prefer not to use try/finally just release the mutex before every "return"
The code does look ugly with the try blocks around but I'm not convinced that it's easier to read or more robust removing the try/finally. I have a few concerns about removing try/finally:
connectGatt
and getRemoteDevice
. If they can throw exceptions, then FBP will end up in a dead lock because mutex is not released. What is the current behaviour if those API calls throw exception?Is there a mutex that can't fail? Acquiring a mutex should not fail. if we are interrupted we should just try again.
I have to admit that I do not know all the conditions that could interrupt the acquiring of mutex. Would suspending the app interrupt it? Would putting the app into the background interrupt it? If we are going to try again, how many times do we try before we give up? If we keep trying, never give up, what is the likelihood of FBP locking up?
Would the code be easier to read if we combine the try/finally and try/catch into try/catch/finally and add a boolean to track whether we have acquired the mutex so we know to release it in the finally?
I'm not sure what the solution is, but right now the diff is too ugly / hard to follow to be merged.
I am not sure whether any of the API calls can throw exceptions or not, l
good point.
If we are going to try again, how many times do we try before we give up? I
never give up. just log and retry. If we can't ever acquire the mutex that is a very serious problem and the program should not continue.
never give up. just log and retry. If we can't ever acquire the mutex that is a very serious problem and the program should not continue.
Yes, I can see the logic here. Happy to retry forever, so we remove 1 try/catch indentation.
Regarding the try/finally, I think it's safer to leave it in there. Can we hide the mutex acquire/release try/catch/finally in a wrapper function? What's the best Java implementation for a wrapper function?
never give up. just log and retry. If we can't ever acquire the mutex
Done. It keeps retrying until mutex is acquired.
I've also hidden the try/catch/finally and acquire/release mutex in runCriticalSection
wrapper that calls the closure. The code looks more readable to me. Thoughts?
ya its an improvement. I still hate all of it though (not meant to be an insult)
it seems so overly complicated.
having these critical sections is not ideal.
are they necessary? I proposed them, but they're a hammer that I wasn't sure if was needed.
I do know we have a lot of state that both the callback & method call handler are changing at the same time.
and now that we are now disconnecting in the callback handler, adding complexity.
But I don't like this. overall this code just seems bad.
Maybe we need a better way to update the state in a more atomic fashion. I'm not sure.
I know I dont want to break a bunch of stuff and cause more work for myself.
I have a lot things to focus on right now.
So unless we can come up with a solution that is cleaner, I think I may not merge this.
as you know, adding complexity to code has a permanent maintenance cost.
a cost that I am the one paying, as the maintainer.
Yes the code does look more complex now. The problem here is we are trying to fix 2 problems with this PR.
I'm going to peel this PR back to only address issue 1. Issue 2 can be addressed by a separate PR if there is evidence that the callback and method call are concurrent.
I agree. But I don't want to change #1 without addressing #2.
they can be separate PR. but #2 is more important.
@chipweinberger Would the diff look better if I open a new PR just for the addition of critical section wrapper? Would you accept that PR?
If yes, then the diff for this PR will look cleaner.
Just FYI, I found the answer to this question.
I am not sure whether any of the API calls can throw exceptions or not
I found this example code in Android documentation.
try {
final BluetoothDevice device = bluetoothAdapter.getRemoteDevice(address);
// connect to the GATT server on the device
bluetoothGatt = device.connectGatt(this, false, bluetoothGattCallback);
return true;
} catch (IllegalArgumentException exception) {
Log.w(TAG, "Device not found with provided address. Unable to connect.");
return false;
}
if there is evidence that the callback and method call are concurrent.
I ran a quick test and can confirm that the gatt callback and method-call are concurrent.
please rebase on master :)
Done. Rebased.
thanks it's looking good
i think we should put this new code in its own function
bool handleAccidentalConnectionEvents(event)
returns true if it was an accidental connection / disconnection
I've move the code to handleUnexpectedConnectionEvents
function. Let me know if you prefer handleAccidentalConnectionEvents
and I'll change it.
good name!
if unexpected event, we can just return early
thanks for you being receptive to me micromanaging your code change :) haha
i'll merge it soon and make any other small formatting changes myself
i will not test it myself, so please be confident it solves your issue :)
thanks!!
if unexpected event, we can just return early
Done.
thanks for you being receptive to me micromanaging your code change
No worries. Your review and comments are high appreciated because you are most familiar with the code base. Thank you for reviewing my PRs.
merged
Addresses issue #934