Closed neilh10 closed 4 months ago
@neilh10 Thanks for providing this characterization data! As requested, we have finally pull together a dashboard from our uptime monitoring service. That have been made available as bit.ly/mmw-uptime
Our monitoring graph seems to track pretty well with this characterization. We see the same bump in performance during March followed by some rough spikes in April and May. I noted a couple of notable maintenance activities on our graph below. We are looking forward to getting the rest of the performance improvements implemented to further improve stability.
Great to see. Thankyou very much. Looks like its a good tool for regression testing and enable some quantification of the effect of changes. !! :)
Interesting. For the data base it does show a challenging side of managing "big data", that it needs to be monitored.
Is the "DataStream" the response that a virtual end-point/devices sees? - that is from the internet. Setting the top right to raw, there are a lot of 15sec response times.
Correct, "DataStream" is a virtual device which makes a post request to the same /api/data-stream
endpoint used by the physical devices.
I'm doing integration testing from my desk and seeing a high rate of POST timeout failures.
I'm wondering if the MMW characterization data is being released, as has been previously discussed. :) #673
This is a follow on from #667 #673 #661
I pulled the .LOG I keep from my LCC45/WiFi node, and it is showing a large number of POST failures.
As an overview - there are three main areas of failure with the ModularSensors/ODM2 a) ModularSensors running on Mayfly b) Wireless links c) the host MonitorMyWatershed.org running ODM2 For a) I have an enhanced reliableDelivery ModularSensors fork that has very solid repeat of messages that aren't acknowledges. Very standard 101 communications theory.
b) Wireless is inherently unreliable, and depends on a host of unpredictable factors including geography, fog, wind direction . ModularSensors uses large packets with redundant UUIDs that make it even more unreliable - who ever designed it didn't understand the extra challenges in riparian zones with vegetation. A better architecture for wireless is slimmed MQTT.
c) the host system. This would be expected to be "reliable", and this report is focused on detected inability to process messages as a timeout "504"
The LCC45 system delivers over WiFi and here is the graph of HTTP responses (number of POSTS on left). Since its WiFi and the communication medium is good, ideally I would expect the number of failures "504" to be 0. LOGGING_INTERVAL_MINUTES=15 SEND_OFFSET_MIN=2 - POSTs at 2minutes past the 15min interval TIMER_POST_TOUT_MS=13000 - 13secs timeout When there is a "201" it is usually returned in under 3seconds
Full data attached 240528_Lcc45_responseAnalysis.xlsx