dryark / stf_ios_support

Central repo to connect and document components/repos needed for IOS stf support
Other
153 stars 65 forks source link

Device interaction timeout #83

Closed rounakcodes closed 3 years ago

rounakcodes commented 3 years ago

If there is an interaction with the device in the stf dashboard after an idle time of approx 5 minutes, this is what happens:

  1. The coordinates of the click can be seen in the Network tab

    Example: 42["input.touchDown","gBSCWYeFO/HKQtEzaTC/c3TPi3M=",{"seq":1,"contact":0,"x":0.7557603686635944,"y":0.14893705221122158,"pressure":0.5}]

  2. There are no new entries in stf_device_ios log

  3. In wdaproxy log, new entries of 200 GET /status (127.0.0.1) continue

  4. There are no new entries in wda log

The interactions are not having any effect on the device. I mean, not just the video stream. There is no change in the actual device. Is there a way to increase the idle timeout limit?

I could not see any template questions, so here is some additional info:

  1. Everything works fine before the idle time The expectation is to have it working as is without any timeout
  2. Provider is on mac mini and stf server is a ubuntu machine in EC2
  3. I am using the latest version of wda and rethinkdb
  4. I am using devicefarmer/stf instead of livxtrm/stf Other than the above, the rest of the setup is as per the README instructions of this repo.

edit:

on localhost:8100, I can interact with the device and observe the changes in the device correctly. So the issue seems to be in passing on the user interaction to wda

issue-label-bot[bot] commented 3 years ago

Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.

nanoscopic commented 3 years ago

The "no interaction timeout" option is called "group-timeout" and is an option passed to the device unit. The option in the stf-ios-provider code can be seen here: https://github.com/DeviceFarmer/stf-ios-provider/blob/master/lib/cli/device-ios/index.js#L94

There is no configurable option right now in stf_ios_support config.json to specify this value so it defaults to 900 seconds. The option could be added to the call to the device unit here: https://github.com/DeviceFarmer/stf_ios_support/blob/master/coordinator/proc_device_unit.go#L53 I believe if you set it to 0 it will never timeout, but I could be wrong.

The timeout should cause a UI popup saying that you have been kicked from the device though. It should not simply "stop working" as you are describing. Also, 900 seconds is 15 minutes not 5. Perhaps you are encountering a different issue?

rounakcodes commented 3 years ago

I have tested several times and found it to be exactly 5 minutes (exact to a second). After the interaction stops, if I refresh the page, then I am not able to see the video either. Until the refresh button is clicked, if I interact with the device (physical) directly, then I can see the video changing in the portal.

Secondly, if I just start the coordinator and do not open the browser to launch the stf portal until 5 minutes, even in that case I am not able to see the device in the browser.

I have never seen the popup. Also ./view_log -proc stf_provider is always empty.

nanoscopic commented 3 years ago

The ./view_log call is ./view_log -proc stf_ios_provider ( see https://github.com/DeviceFarmer/stf_ios_support/blob/master/coordinator/proc_stf_provider.go#L44 ) It is rather inconvenient that you have to know the names of the processes to filter them easily. I've been meaning to update the view_log script to properly show all available options and tell you if one you specify is invalid, just haven't gotten to it yet.

TODO.

The stf_ios_provider though isn't generally what would cause the interactions to break. The interactions are mainly handled by the device unit. stf_device_ios ( https://github.com/DeviceFarmer/stf_ios_support/blob/master/coordinator/proc_device_unit.go#L113 ) So you may see relevant info at ./view_log -proc stf_device_ios.

rounakcodes commented 3 years ago

Thanks for your guidance @nanoscopic. All the best for your new endeavour!

While the interaction stops after 5 minutes, if I do nothing the popup which is supposed to appear after 15 minutes appears. Clicking "re-connect" in the popup does not do anything, probably because of the issue which occurs after 5 minutes.

Given that the timeout is exactly 5 minutes, I have grepped all values (including in the repos folder) like 3000, 5 and such for timeout and replaced them and nothing helped. The "2" and "3" that are sent and received every 25 seconds in Request URL: wss://xxxx/socket.io/?uip=xxxx&EIO=3&transport=websocket also stop being sent and received after 5 minutes. After I click somewhere, again they are sent and received but as I said earlier in the first post, the interactions stop from being translated to wda commands even though the coordinates of the click and such other actions are recorded.

I have also increased the tcp timeout on my Mac OS, ssh keepalive client, server on my ubuntu machine. The log for stf_device_ios also does not show any error, it just stops after 5 minutes. If someone could just confirm, that they are not facing this issue, I would focus on doing a clean setup on a fresh machine.

I have been reading several issues here including: https://github.com/DeviceFarmer/stf_ios_support/issues/85

Clicking "Restart" for STF Device-IOS Unit in the web interface brings back the device for use. (FYI, all the values in the web interface were "on" / "up"). I am wondering if doing it programatically would help but for that I would have to know where in code I have to trigger the restart.

nanoscopic commented 3 years ago

I agree that it sounds like a timeout issue of some sort. I myself have never seen the issue you are describing. I only though ever test via Firefox. What browser are you using to connect to the front end?

Since the websocket is showing no connection after 5 minutes that sounds possible, but it doesn't match up with your description of not being able to connect if you select the device 5 minutes after it was started.

There are two aspects to a device in STF "breaking". One would be WDA breaking, which would cause clicking / other actions to stop working but video would continue to show. That seems to match with your description since you said video will continue if you don't click in the interface.

The other would be video stream breaking. The video stream uses a different websocket and doesn't connect to the node STF code.

This leads me to believe that the problem is with the Node code somehow/where. The 5 minute timeout is probably something to do with websocket timeout by default of the Node module being used.

Edit: Additional info:

The websocket connection is to the STF server. The code I have changed ( stf provider unit and stf device unit ) communicate with the server via ZeroMQ. This would seem to indicate that the issue is with the upstream STF server code being used. The upstream server code uses socket.io for the websocket connection.

socket.io will timeout after 5 minutes if the ping/pong is not working. See https://stackoverflow.com/questions/33430075/socket-io-disconnects-every-5-minutes

There should be some constant data on the socket.io to keep it going though.

rounakcodes commented 3 years ago

I have ensured that the ping/pong continue (i.e. the sending and receiving of 2 and 3 in socket messages) for a day by making changes in the nginx conf for websocket. Earlier they were stopping after 5 minutes. However, the issue of device interaction issue remains as is. PS: Regarding video stream, it works until I reload the page. If I interact with the device (physical) during that time, those changes can also be seen in the browser (both Firefox and Chrome). Once the page is reloaded, the video goes away.

nanoscopic commented 3 years ago

I'll test this and see if I am able to replicate the issue on my own systems. The only extent I've tested what you are describing before now is letting the device sit inactive for 15 minutes till the UI message appears. It is entirely possible that it breaks after 5 minutes for me as well. I think I would have noticed it accidentally but it doesn't hurt to try this again.

If I can replicate the issue I can dig into it myself and fix whatever the problem is.

rounakcodes commented 3 years ago

I was able to reproduce it on one more Mac Mini but not on another. So, I am closing the issue.