ODM2 / ODM2DataSharingPortal

A Python-Django web application enabling users to upload, share, and display data from their environmental monitoring sites via the app's ODM2 database. Data can either be automatically streamed from Internet of Things (IoT) devices, manually uploaded via CSV files, or manually entered into forms.
BSD 3-Clause "New" or "Revised" License
31 stars 8 forks source link

Characterization POSTs against MMW server 2023 May #661

Closed neilh10 closed 11 months ago

neilh10 commented 1 year ago

This is characterization of the upgraded servers (end point through MonitorMyWatershed.org or data.enviroDIY.org) response time over different wireless links, for 4 Mayfly1.1 systems.
Wireless links can be variable in strengthen, and are inherently unreliable. Wireless strength can be reduced by atmospherics such as rain, fog and wind-direction. Wireless links have a footprint radiating out from the antenna, and increasingly becomes noisy, with loss of transmission capability. Inherently more of the footprint can be in the noisy spectrum than inside it, hower cellular modems can connect with multiple towers and route to the tower with the strongest signal.
For noisy links, the larger the transmitted packet, the less chance it will be received successfully.

My fork of ModularSensors has built in reliability (from long experience) to be able cope with all failures with wireless links and cope as best as possible with the servers slow response time
https://github.com/neilh10/ModularSensors/wiki/1a-Feature-notes
however it still needs some reasonable threshold of response from the server.

Conclusion: Current response time characteristics are terrible and appear to have degraded with the upgrade to 0.15.0.
It would be nice to have reliable responses, and for an objective to be in the 400mS response range, which some measurements are in that range, showing there is a path through the software for that response, all attempts should be capable of achieving that response.

This first characterization is under near perfect conditions, POST to server monitorMyWatershed.org
This is a POSTs by a Mayfly 1.1 using a Digi WiFi, on a short 10’f wireless gap to a fast internet connection. (Comcast 11Mbps to internet/upload, 90Mbps from internet download ping 14mS, jitter 1mS)

For a POST to MonitorMyWatershed.org/api/data-stream/ HTTP/1.1 ,
Generally typical http response at 3.2Secs,
occasional fast response ~400-600mS,
and 14% of POSTs fail with a timeout of 30seconds.

The testing method is a POST every two minutes (120seconds) and was let run overnight for a simple 1311 responses.

For the timeout characterization measurement was set to 30,000ms .
The longest successful response 201 was 11000mS .
Shortest was 397mS.

tty230526_ResponseMonitorMyWatershed orgResponseTime

This appears far worse then last year, when no timeouts where recorded. However the internal architecture is believed to be more stable. For clarity a test with 450 POSTs are shown in the graph above.

After https://github.com/ODM2/ODM2DataSharingPortal/issues/658 Was fixed, 3 systems are analyzed. These redirected from Apr 12 to May 23 – 41days the data pentup to be delivered is 4x24x41 ~ 4000 samples

One system TUCA_Sa01 over WiFi (with a 150’ gap) uploaded the results in few days, which was the expected response

Another system LCC45/WiFi, (with 250’ gap) uploaded slowly

230523 early am monitorMyWatershed.org processing restored
230523_1032 – uploaded all to 2023-04-14 12:00, outstanding 3760
230524_1139 – uploaded all to 2023-04-14 4:45, outstanding 3357 [ 403uploaded prev 24hrs ]
230525_1138 - uploaded all to 2023-04-23 2:30, outstanding 2902 [455uploaded prev 24hrs ]
230526_1631 - uploaded all to 2023-04-27 20:00 outstanding 2448 [454uploaded prev 24hrs ]
(adjusted timeouts to 15secs and TIMER_POST_TOUT_MS=4
230527_1920 uploaded all to 2023-04-29 5:30 outstanding 2314 [134uploaded prev 24hrs ]
230528_1444 uploaded all to 2023-05-03 16:30 outstanding 1886 [428uploaded prev 24hrs ]
230529_1400 uploaded all to 2023-05-07 8:00 outstanding 1536 [350uploaded prev 24hrs ]
230529_1401 uploaded all to 2023-05-09 8:45 outstanding 1341 [195 uploaded prev 24hrs ]

The other system TUCA_PO03 which was keeping up OK before v0.15.0, is now severely degraded.
Its not able to complete a basic set of upload, 4POSTs, which would then allow it to start processing its queue. Since https://github.com/ODM2/ODM2DataSharingPortal/issues/658 was restored its not processed any of its backlogged data.

This observation is derived from looking at the incomplete data from "DOWNLOAD SENSOR DATA" at https://monitormywatershed.org/sites/TUCA_PO03/
Its settings are aimed at being forgiving :
COLLECT_READINGS=4 ; Only POST every 1hour with 4 readings
SEND_OFFSET_MIN=2 ; wait to POST to go off peak - 2minutes
TIMER_POST_TOUT_MS=10000 ; POST timeout ms
TIMER_POST_PACE_MS=1000 ; wait 1sec between POSTS

Doing a deeper dive into its internal logs is going to take getting access to private land and a 4hour round trip.

neilh10 commented 1 year ago

Another longer term log of the responses from TU CA PO03, from 2022 January to 2023 March.
In April this node had a virtual outage described in https://github.com/ODM2/ODM2DataSharingPortal/issues/658 This shows that the MMW response times started getting worse in Nov 2022 (data point 11) and then jumped to really bad in Jan 2023 (data point 13) and remained bad.

image

Current downloaded readings from the PO03 indicate the server response is extremely poor, and its still not able to take the extra readings after the virtual outage

neilh10 commented 1 year ago

Here is the data file for the above PO03dbgAnalysis230605.xlsx

neilh10 commented 1 year ago

For an integration testing to (develop) two Mayflys running at same time connecting to local WiFi and then comcast 300mpbs to internet. Mayfly's have Digi WiFi S6B and EnviroDIY Xbee WiFi ESP32-wroom-32.

Testing params against MMW

`    EnviroDIYPOST.setQuedState(true);
    EnviroDIYPOST.setTimerPostTimeout_mS(9876); //9.876Sec
    EnviroDIYPOST.setTimerPostPacing_mS(500);

    dataLogger.setLoggingInterval(2); //Set every minute, default 5min
    dataLogger.setSendOffset(0);
    dataLogger._sendEveryX_cnt=1;
    dataLogger.setPostMax_num(5);`

I saw some amazing improved response times Yeah !!!

S6B/9600Baud average response 1098mS, 53 timeouts, 743 responses ESP32/56KBaud average response 1183mS , 51 timeouts, 729 responses

They both ran for similar times, however in turning the SSID off/on, the ESP32 took longer to recover and missed 2minute sampling

image

image

Response with timeouts set at 9.876seconds image

image tt230709_1057_dvlp_esp32_rspCode.xlsx tty230709_1105_wifi_s6b_rspCode.xlsx

neilh10 commented 11 months ago

I'm closing this as a visible issue - though the data continues to be available for anybody that looks through closed items