dryark / ios_remote_provider

Other
49 stars 33 forks source link

Take care about network delay #57

Open s4ib0t87 opened 3 years ago

s4ib0t87 commented 3 years ago

added var delayed to the ControlFloor class in order to notify the DeviceTracker when CF is ready.

If the ControlFloor server is a remote server it sometime occur that the connection to the server is established after the DeviceTracker tries to register the devices against CF.

In this case DeviceTracker stores the devices in pendingDevs and waits for the notifcation of controlfoor.go but this notification only happens if the server can't be reached at all.

nanoscopic commented 3 years ago

"Registration" of devices against CF is done by way of notifyDeviceExists function. That in turn calls baseNotify. That function calls checkLogin. CheckLogin checks if login to CF has occurred, and if not yet done, does so.

baseNotify is not using the persistent connection between CF and provider. That connection is only used for requests from CF to the provider, not vice versa. Calls to CF are normal GET/POST over HTTP using a session token that is acquired via login.

Perhaps your PR code is changing something in your scenario, but I am not seeing what that is. Were you seeing an error that you don't see any more with this patch? If so, could you please provide the error message so I can understand better what was going wrong and what the appropriate fix would be?

s4ib0t87 commented 3 years ago

Hi, first the logs of an stuck connection attempt:

ios_remote_provider

level=debug msg="Process starting fields" fields="map[binary:bin/go-ios type:proc_start]" proc=device_trigger type=proc_fields
level=info msg="Process start - device_trigger" binary=bin/go-ios proc=device_trigger type=proc_start
level=debug msg="HTTP server started" type=http_start
level=debug msg="Process pid" pid=3180 proc=device_trigger type=proc_pid
level=debug msg="Creating GIDev" type=gidev_create udid="***402E"
Device attached, but ControlFloor not ready.
  udid=xxxx
level=info msg="Logged in to control floor" type=cf_login_success
level=info msg="Connecting ControlFloor WebSocket" link="ws://domain.test:8080/provider/ws" type=cf_ws_connect

cf

Provider login user=testPi pass=xxxx
provider login ok 
provider user=testPi 
got ws connection

I also took a network trace, in which you can see that remote provider logs connects to ControlFloor with HTTP-POST to endpoint /provider/login. After the login the endpoint /provider/device/status/exists is not called.

While debbuging I saw that device_tracker/onDeviceConnect1() is called before the login to ControlFloor was successful. For this reason the device is queued up in the pendingDevs array of device_tracker. The queued devices keep in the queue forever, because device_tracker/cfReady() is only called if the connection to ControlFloor failed and is retried after 10 seconds. Which is in my example not the case. For this reason I implemented the changes.

Hope this explains the issue a bit better. If you need further information's or maybe the pcap, just give me a sign.

s4ib0t87 commented 3 years ago

Just a small point to add, the issue is only occurring in distributed system. In my traces the device registration worked, when the login to ControlFloor took about 120ms. When the network interface was in idle and the login took about 200ms the remote_provider got stuck in the described state.