go-ble / ble

Bluetooth Low Energy for Linux / macOS
BSD 3-Clause "New" or "Revised" License
305 stars 109 forks source link

Scan returns before AdvHandler calls are completed #73

Closed dnesting closed 1 year ago

dnesting commented 4 years ago
github.com/go-ble/ble v0.0.0-20200407180624-067514cd6e24 
go version go1.14.2 linux/amd64
(running on different arch) Linux kw3 4.19.97-v7l+ #1294 SMP Thu Jan 30 13:21:14 GMT 2020 armv7l GNU/Linux

Reproduce (code sample below):

  1. Call Scan with a context that expires or is cancelled
  2. Provide Scan an AdvHandler implementation that is slow
  3. Observe that AdvHandler will continue to be called after Scan returns

It is usually expected that functions that take callbacks will not return until the callbacks have returned and no further callbacks made. It looks like there's some asynchronous "start scanning"/"stop scanning" stuff happening behind the scenes and there should probably be an intentional wait in there (or abandon calls that haven't been made yet).

package main                                                                                                                                           

import (                                                                                                                                               
    "context"                                                                                                                                          
    "fmt"                                                                                                                                              
    "log"                                                                                                                                              
    "sync/atomic"                                                                                                                                      
    "time"                                                                                                                                             

    "github.com/go-ble/ble"                                                                                                                            
    "github.com/go-ble/ble/linux"                                                                                                                      
)                                                                                                                                                      

func main() {                                                                                                                                          
    dev, err := linux.NewDevice()                                                                                                                      
    if err != nil {                                                                                                                                    
        log.Fatal(err)                                                                                                                                 
    }                                                                                                                                                  

    ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)                                                                            
    defer cancel()                                                                                                                                     

    var counter int32                                                                                                                                  
    dev.Scan(ctx, true, func(adv ble.Advertisement) {                                                                                                  
        time.Sleep(100 * time.Millisecond)  // enough to trigger                                                                                               
        atomic.AddInt32(&counter, 1)                                                                                                                   
    })                                                                                                                                                 
    a := atomic.LoadInt32(&counter)                                                                                                                    
    time.Sleep(1 * time.Second)                                                                                                                        
    b := atomic.LoadInt32(&counter)                                                                                                                    
    if a != b {                                                                                                                                        
        fmt.Printf("saw an extra %d calls to AdvHandler\n", b-a)                                                                                       
    } else {                                                                                                                                           
        fmt.Println("no problem")                                                                                                                      
    }                                                                                                                                                  
}

Output:

saw an extra 7 calls to AdvHandler

This creates problems if, for instance, the AdvHandler is trying to send messages on a channel, and you close the channel after Scan returns.

dnesting commented 4 years ago

I was able to eliminate my symptoms with the following change:

This change improves things, but I still get issues even with it. I'll continue playing with it as I have time:

diff --git a/linux/device.go b/linux/device.go
index 1a1ec02..d5dc319 100644
--- a/linux/device.go
+++ b/linux/device.go
@@ -176,6 +176,9 @@ func (d *Device) Scan(ctx context.Context, allowDup bool, h ble.AdvHandler) erro
                return err
        }
        <-ctx.Done()
+       if err := d.HCI.SetAdvHandler(nil); err != nil {
+               return err
+       }
        d.HCI.StopScanning()
        return ctx.Err()
 }
diff --git a/linux/hci/hci.go b/linux/hci/hci.go
index 1c82300..cc3a26c 100644
--- a/linux/hci/hci.go
+++ b/linux/hci/hci.go
@@ -432,7 +432,7 @@ func (h *HCI) handleLEAdvertisingReport(b []byte) error {
                default:
                        a = newAdvertisement(e, i)
                }
-               go h.advHandler(a)
+               h.advHandler(a)
        }

        return nil

I'm not sure what the implications are of running h.advHandler(a) synchronously. The SetAdvHandler(nil) seems to effectively discard any remaining in-flight responses, on the belief that if the context expires, it's because we want to return now, not process everything sent up until now.

estutzenberger commented 2 years ago

I didn't originally write the advertisement handling code, but I suspect the reasoning for it being asynchronous is because there isn't a way for the library to request the radio to hold on to HCI payloads (advertisements, etc) at the hardware level. This means that a synchronous process would potentially cause loss of data if handling of those packets is very slow.

When Scan returns, this means that a request was sent to the radio to stop scanning. However, the radio may still be sending in Scan results in the meantime. The question is if you want those or not.

Either way, fixing this likely involves setting the handler to nil prior to sending the stop scanning command, probably within the code which triggers the stop scan. For example, inside of linux/device.go Scan() after <-ctx.Done() and before d.HCI.StopScanning().