ODM2 / ODM2DataSharingPortal

A Python-Django web application enabling users to upload, share, and display data from their environmental monitoring sites via the app's ODM2 database. Data can either be automatically streamed from Internet of Things (IoT) devices, manually uploaded via CSV files, or manually entered into forms.
BSD 3-Clause "New" or "Revised" License
31 stars 8 forks source link

Not getting a 201 on POST and loosing data #641

Closed neilh10 closed 1 year ago

neilh10 commented 1 year ago

For a number of systems that I am monitoring closely, on a POST, I am not getting an ACK 201 though the data is mostly showing in the downloaded data.

Before [2023-02-02 10:45:39] I was getting reliable ACK 201

For the following posts, all timing out after 15seconds, the first two - the sequence or "Sampling number" makes it into the database "6e5516fc-cdcf-46a6-9c30-d96f3a016b0d":122 and 123.

and the second two 124 and 125 are missing from the download data.

https://monitormywatershed.org/sites/tu_rcru_test06/

[2023-02-11 22:20:15] Waiting up to 15000 ms for Digi XBee3 Cellular LTE-M to respond to AT commands...
[2023-02-11 22:20:16] Connecting to the Internet with Digi XBee3 Cellular LTE-M
[2023-02-11 22:20:24]  WATCHDOG ISR barksUntilReset 74 <--WatchDogAVR
[2023-02-11 22:20:32]  WATCHDOG ISR barksUntilReset 73 <--WatchDogAVR
[2023-02-11 22:20:40]  WATCHDOG ISR barksUntilReset 72 <--WatchDogAVR
[2023-02-11 22:20:44] Connected Internet
[2023-02-11 22:20:44] ... Watchdog low barksUntilReset 72  expected 75
[2023-02-11 22:20:44] 
[2023-02-11 22:20:44] pubDQTR Sending data to [ 0 ] monitormywatershed.org:80
[2023-02-11 22:20:44] POST /api/data-stream/ HTTP/1.1
[2023-02-11 22:20:44] Host: monitormywatershed.org
[2023-02-11 22:20:44] TOKEN: 5083cf3d-69cd-46fa-8d3a-0e3ba0b9e8fd
[2023-02-11 22:20:44] Content-Length: 409
[2023-02-11 22:20:44] Content-Type: application/json
[2023-02-11 22:20:44] 
[2023-02-11 22:20:44] {"sampling_feature":"939d54e3-b92f-4e8d-bae5-1a5b4ae29193","timestamp":"2023-02-11T22:05:00-08:00","6e5516fc-cdcf-46a6-9c30-d96f3a016b0d":122,"fe4b74dd-598a-440d-b7d2-db0119584d5d":3.927,"5cbc0f3a-1b41-43cc-822f-374513de1899":6.30,"41330794-27cf-4130-b2b0-a1cffefe8ba2":0.0716,"941ba4ed-f7d5-4c71-8356-591899ec6e1a":66.41,"d5f4c4dc-421d-436b-9989-99cb94ebc50a":6.26,"ffdc5b73-b800-4957-ab19-b25feba69dde":-57}
[2023-02-11 22:20:44] 
[2023-02-11 22:20:52]  WATCHDOG ISR barksUntilReset 74 <--WatchDogAVR
[2023-02-11 22:21:00]  WATCHDOG ISR barksUntilReset 73 <--WatchDogAVR
[2023-02-11 22:21:01] -- Response Code -- 504 waited  15010 mS Timeout 15000
[2023-02-11 22:21:01] ... Watchdog low barksUntilReset 73  expected 75
[2023-02-11 22:21:01] POST /api/data-stream/ HTTP/1.1
[2023-02-11 22:21:01] Host: monitormywatershed.org
[2023-02-11 22:21:01] TOKEN: 5083cf3d-69cd-46fa-8d3a-0e3ba0b9e8fd
[2023-02-11 22:21:01] Content-Length: 409
[2023-02-11 22:21:01] Content-Type: application/json
[2023-02-11 22:21:01] 
[2023-02-11 22:21:01] {"sampling_feature":"939d54e3-b92f-4e8d-bae5-1a5b4ae29193","timestamp":"2023-02-11T22:10:00-08:00","6e5516fc-cdcf-46a6-9c30-d96f3a016b0d":123,"fe4b74dd-598a-440d-b7d2-db0119584d5d":3.927,"5cbc0f3a-1b41-43cc-822f-374513de1899":6.24,"41330794-27cf-4130-b2b0-a1cffefe8ba2":0.0713,"941ba4ed-f7d5-4c71-8356-591899ec6e1a":67.13,"d5f4c4dc-421d-436b-9989-99cb94ebc50a":6.00,"ffdc5b73-b800-4957-ab19-b25feba69dde":-57}
[2023-02-11 22:21:01] 
[2023-02-11 22:21:09]  WATCHDOG ISR barksUntilReset 74 <--WatchDogAVR
[2023-02-11 22:21:17]  WATCHDOG ISR barksUntilReset 73 <--WatchDogAVR
[2023-02-11 22:21:17] -- Response Code -- 504 waited  15009 mS Timeout 15000
[2023-02-11 22:21:17] ... Watchdog low barksUntilReset 73  expected 75
[2023-02-11 22:21:18] POST /api/data-stream/ HTTP/1.1
[2023-02-11 22:21:18] Host: monitormywatershed.org
[2023-02-11 22:21:18] TOKEN: 5083cf3d-69cd-46fa-8d3a-0e3ba0b9e8fd
[2023-02-11 22:21:18] Content-Length: 409
[2023-02-11 22:21:18] Content-Type: application/json
[2023-02-11 22:21:18] 
[2023-02-11 22:21:18] {"sampling_feature":"939d54e3-b92f-4e8d-bae5-1a5b4ae29193","timestamp":"2023-02-11T22:15:00-08:00","6e5516fc-cdcf-46a6-9c30-d96f3a016b0d":124,"fe4b74dd-598a-440d-b7d2-db0119584d5d":3.927,"5cbc0f3a-1b41-43cc-822f-374513de1899":6.17,"41330794-27cf-4130-b2b0-a1cffefe8ba2":0.0713,"941ba4ed-f7d5-4c71-8356-591899ec6e1a":67.01,"d5f4c4dc-421d-436b-9989-99cb94ebc50a":6.02,"ffdc5b73-b800-4957-ab19-b25feba69dde":-57}
[2023-02-11 22:21:18] 
[2023-02-11 22:21:25]  WATCHDOG ISR barksUntilReset 74 <--WatchDogAVR
[2023-02-11 22:21:34]  WATCHDOG ISR barksUntilReset 73 <--WatchDogAVR
[2023-02-11 22:21:34] -- Response Code -- 504 waited  15000 mS Timeout 15000
[2023-02-11 22:21:34] ... Watchdog low barksUntilReset 73  expected 75
[2023-02-11 22:21:34] POST /api/data-stream/ HTTP/1.1
[2023-02-11 22:21:34] Host: monitormywatershed.org
[2023-02-11 22:21:34] TOKEN: 5083cf3d-69cd-46fa-8d3a-0e3ba0b9e8fd
[2023-02-11 22:21:34] Content-Length: 409
[2023-02-11 22:21:34] Content-Type: application/json
[2023-02-11 22:21:34] 
[2023-02-11 22:21:34] {"sampling_feature":"939d54e3-b92f-4e8d-bae5-1a5b4ae29193","timestamp":"2023-02-11T22:20:00-08:00","6e5516fc-cdcf-46a6-9c30-d96f3a016b0d":125,"fe4b74dd-598a-440d-b7d2-db0119584d5d":3.927,"5cbc0f3a-1b41-43cc-822f-374513de1899":6.11,"41330794-27cf-4130-b2b0-a1cffefe8ba2":0.0709,"941ba4ed-f7d5-4c71-8356-591899ec6e1a":67.62,"d5f4c4dc-421d-436b-9989-99cb94ebc50a":5.81,"ffdc5b73-b800-4957-ab19-b25feba69dde":-57}
[2023-02-11 22:21:34] 
[2023-02-11 22:21:42]  WATCHDOG ISR barksUntilReset 74 <--WatchDogAVR
[2023-02-11 22:21:50]  WATCHDOG ISR barksUntilReset 73 <--WatchDogAVR
[2023-02-11 22:21:51] -- Response Code -- 504 waited  15010 mS Timeout 15000
[2023-02-11 22:21:51] ... Watchdog low barksUntilReset 73  expected 75
[2023-02-11 22:21:51] Sent 4 readings in 66.53 sec. Queued readings= 6619

For a 2nd system that was being monitored till [2023-02-08 18:14:01],

it was reliable until the following POST, and then seemed to get a lot less reliable. https://monitormywatershed.org/sites/tu_rcru_test07/

[2023-02-02 10:35:40] pubDQTR Sending data to [ 0 ] monitormywatershed.org:80
[2023-02-02 10:35:40] POST /api/data-stream/ HTTP/1.1
[2023-02-02 10:35:40] Host: monitormywatershed.org
[2023-02-02 10:35:40] TOKEN: cc8c415a-b35a-44c8-85c2-cf2311eca30e
[2023-02-02 10:35:40] Content-Length: 410
[2023-02-02 10:35:40] Content-Type: application/json
[2023-02-02 10:35:40] 
[2023-02-02 10:35:40] {"sampling_feature":"95072e6b-9796-4fee-b686-80bd3a355826","timestamp":"2023-02-02T10:30:00-08:00","74a1fed5-7829-4e88-9560-fc136411efcd":8496,"d282772a-0ca4-492c-8e8d-ad7137ddecc1":4.154,"61fff54d-ecb2-4c04-8961-14ff14837ae3":5.63,"58d5b96d-8f5d-45ab-a310-a44ffa68ec0e":0.0560,"d3271afd-c1c5-4375-9841-bbc93b5917ce":71.39,"fef0a40d-6954-4a7d-9087-de5c66b57f83":5.19,"e2199c70-ac6b-495d-b514-9850e21b6423":-57}
[2023-02-02 10:35:40] 
[2023-02-02 10:35:48]  WATCHDOG ISR barksUntilReset 74 <--WatchDogAVR
[2023-02-02 10:35:48] -- Response Code -- 504 waited  7000 mS Timeout 7000
[2023-02-02 10:35:48] ... Watchdog low barksUntilReset 74  expected 75
[2023-02-02 10:35:49] POST /api/data-stream/ HTTP/1.1
[2023-02-02 10:35:49] Host: monitormywatershed.org
[2023-02-02 10:35:49] TOKEN: cc8c415a-b35a-44c8-85c2-cf2311eca30e
[2023-02-02 10:35:49] Content-Length: 410
[2023-02-02 10:35:49] Content-Type: application/json
[2023-02-02 10:35:49] 
[2023-02-02 10:35:49] {"sampling_feature":"95072e6b-9796-4fee-b686-80bd3a355826","timestamp":"2023-02-02T10:35:00-08:00","74a1fed5-7829-4e88-9560-fc136411efcd":8497,"d282772a-0ca4-492c-8e8d-ad7137ddecc1":4.154,"61fff54d-ecb2-4c04-8961-14ff14837ae3":5.74,"58d5b96d-8f5d-45ab-a310-a44ffa68ec0e":0.0556,"d3271afd-c1c5-4375-9841-bbc93b5917ce":72.19,"fef0a40d-6954-4a7d-9087-de5c66b57f83":5.04,"e2199c70-ac6b-495d-b514-9850e21b6423":-57}
[2023-02-02 10:35:49] 
[2023-02-02 10:35:56]  WATCHDOG ISR barksUntilReset 74 <--WatchDogAVR
[2023-02-02 10:35:57] -- Response Code -- 201 waited  6914 mS Timeout 7000
[2023-02-02 10:35:57] ... Watchdog low barksUntilReset 74  expected 75
[2023-02-02 10:35:58] Sent 2 readings in 18.59 sec. Queued readings= 1
[2023-02-02 10:36:00] POST /api/data-stream/ HTTP/1.1
[2023-02-02 10:36:00] Host: monitormywatershed.org
[2023-02-02 10:36:00] TOKEN: cc8c415a-b35a-44c8-85c2-cf2311eca30e
[2023-02-02 10:36:00] Content-Length: 410
[2023-02-02 10:36:00] Content-Type: application/json
[2023-02-02 10:36:00] 
[2023-02-02 10:36:00] {"sampling_feature":"95072e6b-9796-4fee-b686-80bd3a355826","timestamp":"2023-02-02T10:30:00-08:00","74a1fed5-7829-4e88-9560-fc136411efcd":8496,"d282772a-0ca4-492c-8e8d-ad7137ddecc1":4.154,"61fff54d-ecb2-4c04-8961-14ff14837ae3":5.63,"58d5b96d-8f5d-45ab-a310-a44ffa68ec0e":0.0560,"d3271afd-c1c5-4375-9841-bbc93b5917ce":71.39,"fef0a40d-6954-4a7d-9087-de5c66b57f83":5.19,"e2199c70-ac6b-495d-b514-9850e21b6423":-57}
[2023-02-02 10:36:00] 
[2023-02-02 10:36:04] -- Response Code -- 201 waited  2687 mS Timeout 7000
[2023-02-02 10:36:04] seQCF Que for next pass unsent records 0

Starting with this Sequence Number=9907 "74a1fed5-7829-4e88-9560-fc136411efcd":9907 it is also not having the data appear in the database.

[2023-02-07 09:51:09] {"sampling_feature":"95072e6b-9796-4fee-b686-80bd3a355826","timestamp":"2023-02-07T09:50:00-08:00","74a1fed5-7829-4e88-9560-fc136411efcd":9907,"d282772a-0ca4-492c-8e8d-ad7137ddecc1":4.154,"61fff54d-ecb2-4c04-8961-14ff14837ae3":6.92,"58d5b96d-8f5d-45ab-a310-a44ffa68ec0e":0.0615,"d3271afd-c1c5-4375-9841-bbc93b5917ce":81.01,"fef0a40d-6954-4a7d-9087-de5c66b57f83":7.16,"e2199c70-ac6b-495d-b514-9850e21b6423":-57}
[2023-02-07 09:51:09] 
[2023-02-07 09:51:17]  WATCHDOG ISR barksUntilReset 74 <--WatchDogAVR
[2023-02-07 09:51:17] -- Response Code -- 504 waited  7000 mS Timeout 7000
[2023-02-07 09:51:17] ... Watchdog low barksUntilReset 74  expected 75
[2023-02-07 09:51:17] Sent 4 readings in 34.51 sec. Queued readings= 407

This seems similar https://github.com/ODM2/ODM2DataSharingPortal/issues/542

https://github.com/ODM2/ODM2DataSharingPortal/pull/547

neilh10 commented 1 year ago

well just to have a conversation with myself, the response times have improved, so data is being accepted. Times are from 0.9 to 3.7Secs [2023-03-02 16:40:44] -- Response Code -- 201 waited 915 mS Timeout 15000 [2023-03-02 16:40:51] -- Response Code -- 201 waited 3699 mS Timeout 15000 [2023-03-02 17:00:54] -- Response Code -- 201 waited 9372 mS Timeout 15000 with an occasional [2023-03-02 17:30:58] -- Response Code -- 504 waited 15009 mS Timeout 15000

For some 6000 readings, they all eventually uploaded over about two weeks. https://monitormywatershed.org/sites/tu_rcru_test06/ running with Mayfly release over an Digi LTE https://github.com/neilh10/ModularSensors/releases/tag/v0.34.1-aba-release1_221227

For the two months starting at January 3 ending Feb 28th, some 16177 readings all uploaded successfully, no data loss

Seems to be fixed after this https://github.com/ODM2/ODM2DataSharingPortal/issues/635#issuecomment-1450467143 or maybe something was rebooted and is still lurking.

ptomasula commented 1 year ago

Hey @neilh10, sorry for the delayed response to this. I'm glad to hear that response times have improved recently.

Our work on #635, is presently only on the staging, so I'm not sure that explains the sudden change in performance. I did reboot the database server in response to #642 on 02/24/2023. The CPU utilization on our production database server seemed to be steadily creeping up over the past few months and based on #642 looks to have crossed a critical threshold about 2-3 weeks ago.

neilh10 commented 1 year ago

Hi @ptomasula thanks for the reboot - or whatever it was speeding it up. I'm just reflecting what I see. It started working good enough. Some systems reboot regularly.

I try and spread the IoT load, https://github.com/ODM2/ODM2DataSharingPortal/issues/485

I get a pretty interesting pattern of a timeout 504, usually on the hour 00 or 30. This setup POST two readings every 10minutes.


$ grep " Response Code " ttyUSB0_2302272110.txt
[2023-03-03 03:00:58] -- Response Code -- 504 waited  15010 mS Timeout 15000
[2023-03-03 03:01:01] -- Response Code -- 201 waited  927 mS Timeout 15000
[2023-03-03 03:01:08] -- Response Code -- 201 waited  3169 mS Timeout 15000
[2023-03-03 03:10:45] -- Response Code -- 201 waited  1375 mS Timeout 15000
[2023-03-03 03:10:52] -- Response Code -- 201 waited  3444 mS Timeout 15000
[2023-03-03 03:20:44] -- Response Code -- 201 waited  964 mS Timeout 15000
[2023-03-03 03:20:50] -- Response Code -- 201 waited  3556 mS Timeout 15000
[2023-03-03 03:30:57] -- Response Code -- 504 waited  15010 mS Timeout 15000
[2023-03-03 03:31:02] -- Response Code -- 201 waited  2916 mS Timeout 15000
[2023-03-03 03:31:09] -- Response Code -- 201 waited  2445 mS Timeout 15000
[2023-03-03 03:40:45] -- Response Code -- 201 waited  1001 mS Timeout 15000
[2023-03-03 03:40:51] -- Response Code -- 201 waited  3444 mS Timeout 15000
[2023-03-03 03:50:44] -- Response Code -- 201 waited  1387 mS Timeout 15000
[2023-03-03 03:50:51] -- Response Code -- 201 waited  3783 mS Timeout 15000
[2023-03-03 04:00.58] -- Response Code -- 504 waited  15010 mS Timeout 15000
[2023-03-03 04:01:02] -- Response Code -- 201 waited  2011 mS Timeout 15000
[2023-03-03 04:01:08] -- Response Code -- 201 waited  2446 mS Timeout 15000
[2023-03-03 04:10:44] -- Response Code -- 201 waited  881 mS Timeout 15000
[2023-03-03 04:10:51] -- Response Code -- 201 waited  3951 mS Timeout 15000
[2023-03-03 04:20:44] -- Response Code -- 201 waited  1011 mS Timeout 15000
[2023-03-03 04:20:50] -- Response Code -- 201 waited  3423 mS Timeout 15000
[2023-03-03 04:30:58] -- Response Code -- 504 waited  15010 mS Timeout 15000
[2023-03-03 04:31:01] -- Response Code -- 201 waited  965 mS Timeout 15000
[2023-03-03 04:31:07] -- Response Code -- 201 waited  2529 mS Timeout 15000
[2023-03-03 04:40:47] -- Response Code -- 201 waited  4155 mS Timeout 15000
[2023-03-03 04:40:53] -- Response Code -- 201 waited  3469 mS Timeout 15000
[2023-03-03 04:50:58] -- Response Code -- 504 waited  15009 mS Timeout 15000
[2023-03-03 04:51:14] -- Response Code -- 504 waited  15000 mS Timeout 15000
[2023-03-03 05:00:58] -- Response Code -- 504 waited  15010 mS Timeout 15000
[2023-03-03 05:01:15] -- Response Code -- 504 waited  15000 mS Timeout 15000
[2023-03-03 05:20:58] -- Response Code -- 504 waited  15000 mS Timeout 15000
[2023-03-03 05:21:01] -- Response Code -- 201 waited  1445 mS Timeout 15000
[2023-03-03 05:21:08] -- Response Code -- 201 waited  4108 mS Timeout 15000
[2023-03-03 05:21:15] -- Response Code -- 201 waited  3795 mS Timeout 15000
[2023-03-03 05:21:21] -- Response Code -- 201 waited  2470 mS Timeout 15000
[2023-03-03 05:21:28] -- Response Code -- 201 waited  3770 mS Timeout 15000
[2023-03-03 05:21:35] -- Response Code -- 201 waited  3506 mS Timeout 15000
[2023-03-03 05:21:41] -- Response Code -- 201 waited  3494 mS Timeout 15000
[2023-03-03 05:21:47] -- Response Code -- 201 waited  3447 mS Timeout 15000
[2023-03-03 05:30:58] -- Response Code -- 504 waited  15000 mS Timeout 15000
[2023-03-03 05:31:15] -- Response Code -- 504 waited  15012 mS Timeout 15000
[2023-03-03 05:40:58] -- Response Code -- 504 waited  15010 mS Timeout 15000
[2023-03-03 05:41:02] -- Response Code -- 201 waited  1229 mS Timeout 15000
[2023-03-03 05:41:08] -- Response Code -- 201 waited  2758 mS Timeout 15000
[2023-03-03 05:41:15] -- Response Code -- 201 waited  3795 mS Timeout 15000
[2023-03-03 05:41:22] -- Response Code -- 201 waited  3795 mS Timeout 15000
[2023-03-03 05:50:52] -- Response Code -- 201 waited  7385 mS Timeout 15000
[2023-03-03 05:50:58] -- Response Code -- 201 waited  3867 mS Timeout 15000
[2023-03-03 06:00:58] -- Response Code -- 504 waited  15012 mS Timeout 15000
[2023-03-03 06:01:08] -- Response Code -- 201 waited  7866 mS Timeout 15000
[2023-03-03 06:01:15] -- Response Code -- 201 waited  3085 mS Timeout 15000
[2023-03-03 06:10:51] -- Response Code -- 201 waited  7529 mS Timeout 15000
[2023-03-03 06:10:58] -- Response Code -- 201 waited  3866 mS Timeout 15000
[2023-03-03 06:20:53] -- Response Code -- 201 waited  9384 mS Timeout 15000
[2023-03-03 06:20:59] -- Response Code -- 201 waited  3709 mS Timeout 15000
[2023-03-03 06:30:58] -- Response Code -- 504 waited  15012 mS Timeout 15000