Closed pliablepixels closed 9 years ago
@alager, some followup questions:
a) How often does this occur? b) Can you check if this mostly occurs when you are moving from montage view to System status view or does it happen when transitioning from other views to System status as well? c) When this occurs, can you check via your network tools how many connections are active from ZMNinja to ZM?
thx
a) It's only happened once that I've noticed b) will keep an eye on the behavior c) I have wireshark, I'll see if I can catch it.
Here are the syslogs
10.5.16.160 - - [25/Sep/2015:15:15:25 -0700] "GET /zm/index.php HTTP/1.1" 200 2995 "http://10.5.27.58/zm/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
10.5.20.79 - - [25/Sep/2015:15:15:27 -0700] "GET /zm/api/host/daemonCheck.json HTTP/1.1" 200 267 "-" "Mozilla/5.0 (Linux; Android 4.4.4; HTC One_M8 Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Crosswalk/14.43.343.24 Mobile Safari/537.36"
10.5.20.79 - - [25/Sep/2015:15:15:27 -0700] "GET /zm/api/host/getLoad.json HTTP/1.1" 200 311 "-" "Mozilla/5.0 (Linux; Android 4.4.4; HTC One_M8 Build/KTU84P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Crosswalk/14.43.343.24 Mobile Safari/537.36"
10.5.16.160 - - [25/Sep/2015:15:16:26 -0700] "GET /zm/index.php HTTP/1.1" 200 2995 "http://10.5.27.58/zm/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
10.5.16.160 - - [25/Sep/2015:15:17:26 -0700] "GET /zm/index.php HTTP/1.1" 200 3002 "http://10.5.27.58/zm/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"
and here are the zmNinja logs
Sep 25, 2015 3:17:37 PM INFO Authentication is disabled. Skipping login
Sep 25, 2015 3:17:37 PM INFO Authentication not enabled. Skipping Timer
Sep 25, 2015 3:17:37 PM INFO No need to cancel zmAutologin timer. Auth is off
Sep 25, 2015 3:17:37 PM INFO Not iOS, not checking for touchID
Sep 25, 2015 3:17:37 PM INFO User credentials are provided
Sep 25, 2015 3:17:36 PM INFO Restarting montage timer on resume
Sep 25, 2015 3:17:36 PM INFO Restarting montage timer on resume
Sep 25, 2015 3:17:36 PM INFO App is resuming from background
Sep 25, 2015 3:17:33 PM INFO No need to cancel zmAutologin timer. Auth is off
Sep 25, 2015 3:17:33 PM INFO App is going into background
Sep 25, 2015 3:16:20 PM ERROR Error retrieving DiskStatus: {"data":null,"status":0,"config":{"method":"GET","transformRequest":[null],"transformResponse":[null],"url":"http://<server>/api/host/getDiskPercent.json","headers":{"Accept":"application/json, text/plain, */*"},"timeout":60000},"statusText":""}
Sep 25, 2015 3:13:59 PM INFO Stream authentication construction:
Sep 25, 2015 3:13:59 PM INFO Data Model: Did not find any stream mode of auth
Sep 25, 2015 3:13:59 PM INFO DataModel: Did not find a stream auth key, looking for user=
Sep 25, 2015 3:13:59 PM INFO Inside Montage Ctrl:We found 4 monitors
Sep 25, 2015 3:13:59 PM INFO Returning pre-loaded list of 4 monitors
Just to complete the picture I have apache configured for 4 processes.
So the previous comment should have said I had 4 apache processes running. Configured was 2, with 1 min spare and 5 max spare. I've changed it to: 4, with 2 min spare and 4 max spare now I'm not seeing the timeouts....I'll keep an eye on this.
Interesting. So Apache dropped that API HTTP request then? I see in your logs that it got a "daemonCheck" and then it got a "getLoad" but never received a "diskStatus" and my app returned an error, so does that mean the apache process just dropped that connection? Is that how Apache's limit works? I thought it would queue it up. The fact that you increased these limits and the problem is not recurring implies its not the app browser that is causing this issue, then.
running 6 apache processes, I see an odd behavior here.
10.5.20.87 - - [28/Sep/2015:16:02:01 -0700] "GET /zm/api/host/daemonCheck.json
10.5.20.87 - - [28/Sep/2015:16:02:02 -0700] "GET /zm/api/host/getLoad.json
Even though the time stamp is within 1 second of the above two rows, these two rows below don't show up in the access log until almost 40 seconds later, and that is the moment that the page changes from "loading..." to 588.6GB for disk usage.
10.5.20.87 - - [28/Sep/2015:16:02:02 -0700] "GET /zm/api/host/getDiskPercent.json
10.5.20.87 - - [28/Sep/2015:16:02:03 -0700] "GET /zm/api/States.json
Now running 11 processes, the delay between those two sets is now about 3 seconds wall clock time, as is the update for the disk usage, but the logs still indicate about 1 second.
Also of note, I have to exit zmNinja each time to test this, as the swipe down to refresh does not appear to refresh when the number of processes was 6 or less.
Now that it I have 11 processes, swipe to refresh is working.
@alager a large delay for disk status is normal - mine takes 15s - zm executes a Unix dF that takes a long time spending on number of events files.
is this still a problem @alager ?
no, once I increased the number of apache processes, these issues went away.
@alager reports that when viewing the "System Status" screen, the HTTP APIs return an "error" like ZM returned nothing. However, he looked at logs of ZM and the request does not get there.
My hypothesis is this: Chrome has a limit of 6 concurrent connections to one domain - when you view System Status, it shoots off 4 APIs in parallel (4 HTTP connections). In the event there are other HTTP connections that have not yet terminated in zmNinja, this may be resulting in a hang state.
In the past I've observed when you hit this limit, Chrome (the web view the client is using) does not recover well from this. In addition, I make HTTP timeouts for the system status requests long. I don't know if this adds to the issue.
This issue will track this problem as I take @alager's help to characterize this better