ODM2 / ODM2DataSharingPortal

A Python-Django web application enabling users to upload, share, and display data from their environmental monitoring sites via the app's ODM2 database. Data can either be automatically streamed from Internet of Things (IoT) devices, manually uploaded via CSV files, or manually entered into forms.
BSD 3-Clause "New" or "Revised" License
31 stars 8 forks source link

POST to monitormywatershed.org broken #658

Closed neilh10 closed 1 year ago

neilh10 commented 1 year ago

In using POSTs to monitormywatershed.org I'm getting a 301 This started happening after the upgrade on March 12th upgrade v0.15.0?

https://github.com/ODM2/ODM2DataSharingPortal/issues/316 https://github.com/ODM2/ODM2DataSharingPortal/issues/542

On a POST to monitormywatershed.org

Connected Internet

pubDQTR Sending data to [ 0 ] monitormywatershed.org:80
POST /api/data-stream/ HTTP/1.1
Host: monitormywatershed.org
TOKEN: 0cf7c40a-232e-457d-87d6-cea5c0757fec
Content-Length: 409
Content-Type: application/json

{"sampling_feature":"236c674b-69b9-43af-b0d6-33d67b870ecc","timestamp":"2023-05-17T10:12:06-08:00","8c57835f-a32f-4d62-82dc-0ba09f04cf52":1,"3bebd4a3-8b54-4f92-ba55-5fd2fd021358":3.927,"03e7b375-97a7-4423-a3f0-1d822d8b19b9":18.00,"43bcda9b-2973-4639-af2c-f0b6bb3fa44b":0.1467,"08646cc3-c5de-414c-af65-c795b2dcac24":59.82,"8849814d-1603-4a2f-861f-f31ae68cccf3":19.35,"7182846e-46e0-4a10-b110-9bc32de4aca9":-18}

-- Response Code -- 301 waited  326 mS Timeout 8000

when I change to data.envirodiy.org

Connected Internet

pubDQTR Sending data to [ 0 ] data.envirodiy.org:80
POST /api/data-stream/ HTTP/1.1
Host: data.envirodiy.org
TOKEN: 0cf7c40a-232e-457d-87d6-cea5c0757fec
Content-Length: 409
Content-Type: application/json

{"sampling_feature":"236c674b-69b9-43af-b0d6-33d67b870ecc","timestamp":"2023-05-17T10:37:43-08:00","8c57835f-a32f-4d62-82dc-0ba09f04cf52":1,"3bebd4a3-8b54-4f92-ba55-5fd2fd021358":3.927,"03e7b375-97a7-4423-a3f0-1d822d8b19b9":18.10,"43bcda9b-2973-4639-af2c-f0b6bb3fa44b":0.1468,"08646cc3-c5de-414c-af65-c795b2dcac24":59.13,"8849814d-1603-4a2f-861f-f31ae68cccf3":19.62,"7182846e-46e0-4a10-b110-9bc32de4aca9":-20}

-- Response Code -- 201 waited  577 mS Timeout 8000
ptomasula commented 1 year ago

Thanks @neilh10, this is very helpful information!

I was initially confused by this behavior because our git history shows no changes to nginx config file. Nginx presently handles HTTP/S routing for us. Upon further inspection of the settings file on production server, I found a block at the top of the specification which explicitly reroutes only http traffic to monitormywatershed.org to the https equivalent.

Generally speaking that is the behavior we want for most site traffic, however I have an explicit exception to not reroute */api/ traffic because the current upload protocol was not designed to handle the 301 redirect. I have removed that block from the nginx config, so I would expect your original post to monitormywatershed.org will now work correctly, and not return a 301.

I'll look into the how that block made its way into our production config. It is possible that our certificate management system automatically populated that block, but I'll look into it further.

Would you be able to try your original post again and verify the correct behavior?

neilh10 commented 1 year ago

Hi @ptomasula thanks for the looking at it. Hurrah data on the stopped fields sites is starting to flow again

The sites that stopped delivering data Apr 12 https://monitormywatershed.org/sites/nh_LCC45/ https://monitormywatershed.org/sites/TUCA_PO03/ https://monitormywatershed.org/sites/TUCA_Sa01/ as described here https://www.envirodiy.org/topic/systems-not-recognized-from-12th-v0-15-0/ have now all recorded a series of the latest readings - indicating that POSTs to monitormywatershed.org:80 are getting through!!

The reliable delivery algorithm's that I've implemented should now kick in - for https://monitormywatershed.org/sites/nh_LCC45/ this will attempt to upload 100 readings every 15minutes. For the other two are communicating every 1hour, and then upload 100 readings.

https://github.com/ODM2/ODM2DataSharingPortal/issues/485

However, trying from my test station at my desk, POST /api/data-stream/ HTTP/1.1 Host: monitormywatershed.org

I'm getting timeouts after 8seconds (then lengthed it to 10seconds) [2023-05-23 10:10:32.215] -- Response Code -- 504 waited 8012 mS Timeout 8000 [2023-05-23 10:16:24.737] -- Response Code -- 504 waited 10011 mS Timeout 10000

Then suddenly it accepts the POST and responds in 700mS

[2023-05-23 10:14:22.397] -- Response Code -- 201 waited 770 mS Timeout 10000 so mostly working .. :)

ptomasula commented 1 year ago

Thanks @neilh10, glad to hear it is mostly working now. A 504 response is a gateway timeout. I have received a few notifications about brief performance issues this afternoon, so it is likely related to that.

The performance slowdowns might actually be an artifact of the batch upload. When that algorithm attempts to reupload, I assume it issues each data point as a separate request? I don't think the endpoint support batch upload, so I think it would have to be multiple requests. We have a request to support that, but have not implemented that yet.

neilh10 commented 1 year ago

@ptomasula seems to be delivering slowly .

for LCC45 these are defined as LOGGING_INTERVAL_MINUTES=15 COLLECT_READINGS=0 ; Number of readings to collect before send 0to30 SEND_OFFSET_MIN=0 ;minutes to wait after collection complete 0-30 POST_MAX_NUM =100 ;On POSTing MAX NUM after which defered next connection

A timeout also represents a lot of used power.

For the https://monitormywatershed.org/sites/nh_LCC45/ downloading the historical data, it seems it is getting a lot of timeouts. In 90minutes of elapsed time, 6 upload attempts at 15minutes, Its only uploaded 3 historical data items. It is set to upload a max of 100 items each attempt - so it could have uploaded 6*100 historical items. For all the failures below, (main) ModualSensors could lose the data.

Here is the data from my test system, set to upload every 2min, starting with a back log of data, and first few noted

[2023-05-23 10:14:14.770] -- Response Code -- 201 waited 938 mS Timeout 10000 [2023-05-23 10:14:19.023] -- Response Code -- 201 waited 627 mS Timeout 10000 [2023-05-23 10:14:22.397] -- Response Code -- 201 waited 770 mS Timeout 10000 [2023-05-23 10:14:25.565] -- Response Code -- 201 waited 566 mS Timeout 10000 - upload 1+3 queue data

[2023-05-23 10:16:24.737] -- Response Code -- 504 waited 10011 mS Timeout 10000 - failed, que reading [2023-05-23 10:18:23.625] -- Response Code -- 201 waited 3770 mS Timeout 10000 [2023-05-23 10:18:27.956] -- Response Code -- 201 waited 723 mS Timeout 10000 - upload 1+ 1 queue data [2023-05-23 10:20:23.964] -- Response Code -- 201 waited 8974 mS Timeout 10000 [2023-05-23 10:22:24.720] -- Response Code -- 504 waited 10011 mS Timeout 10000- failed, que reading [2023-05-23 10:24:29.584] -- Response Code -- 504 waited 10010 mS Timeout 10000- failed, que reading [2023-05-23 10:26:29.589] -- Response Code -- 504 waited 10011 mS Timeout 10000- failed, que reading [2023-05-23 10:28:23.621] -- Response Code -- 201 waited 3795 mS Timeout 10000 [2023-05-23 10:28:27.780] -- Response Code -- 201 waited 543 mS Timeout 10000 [2023-05-23 10:28:31.122] -- Response Code -- 201 waited 723 mS Timeout 10000 [2023-05-23 10:28:34.400] -- Response Code -- 201 waited 676 mS Timeout 10000 - upload 1 + 3 queue data [2023-05-23 10:30:24.727] -- Response Code -- 504 waited 10011 mS Timeout 10000 [2023-05-23 10:32:21.130] -- Response Code -- 201 waited 4180 mS Timeout 10000 [2023-05-23 10:32:25.458] -- Response Code -- 201 waited 661 mS Timeout 10000 - upload 1 + 3 queue data [2023-05-23 10:34:18.829] -- Response Code -- 201 waited 3854 mS Timeout 10000 [2023-05-23 10:36:18.812] -- Response Code -- 201 waited 3830 mS Timeout 10000 [2023-05-23 10:38:24.722] -- Response Code -- 504 waited 10010 mS Timeout 10000 [2023-05-23 10:40:29.536] -- Response Code -- 504 waited 10001 mS Timeout 10000 [2023-05-23 10:42:23.582] -- Response Code -- 201 waited 3760 mS Timeout 10000 [2023-05-23 10:42:28.102] -- Response Code -- 201 waited 879 mS Timeout 10000 [2023-05-23 10:42:31.454] -- Response Code -- 201 waited 757 mS Timeout 10000 - upload 1+ 3 queue data [2023-05-23 10:44:18.519] -- Response Code -- 201 waited 3541 mS Timeout 10000 [2023-05-23 10:46:18.659] -- Response Code -- 201 waited 3687 mS Timeout 10000 [2023-05-23 10:48:23.694] -- Response Code -- 201 waited 3854 mS Timeout 10000 [2023-05-23 10:50:29.556] -- Response Code -- 504 waited 10010 mS Timeout 10000 [2023-05-23 10:52:25.270] -- Response Code -- 201 waited 5434 mS Timeout 10000 [2023-05-23 10:52:29.363] -- Response Code -- 201 waited 471 mS Timeout 10000 [2023-05-23 10:54:18.501] -- Response Code -- 201 waited 3528 mS Timeout 10000 [2023-05-23 10:56:18.820] -- Response Code -- 201 waited 3855 mS Timeout 10000 [2023-05-23 10:58:18.567] -- Response Code -- 201 waited 3613 mS Timeout 10000 [2023-05-23 11:00:29.560] -- Response Code -- 504 waited 10010 mS Timeout 10000 [2023-05-23 11:02:23.277] -- Response Code -- 201 waited 3482 mS Timeout 10000 [2023-05-23 11:02:27.341] -- Response Code -- 201 waited 421 mS Timeout 10000 [2023-05-23 11:04:18.381] -- Response Code -- 201 waited 3434 mS Timeout 10000 [2023-05-23 11:06:18.385] -- Response Code -- 201 waited 3444 mS Timeout 10000 [2023-05-23 11:08:18.378] -- Response Code -- 201 waited 3432 mS Timeout 10000 [2023-05-23 11:10:29.547] -- Response Code -- 504 waited 10011 mS Timeout 10000 [2023-05-23 11:12:23.305] -- Response Code -- 201 waited 3494 mS Timeout 10000 [2023-05-23 11:12:27.337] -- Response Code -- 201 waited 410 mS Timeout 10000 [2023-05-23 11:14:18.419] -- Response Code -- 201 waited 3469 mS Timeout 10000 [2023-05-23 11:16:18.379] -- Response Code -- 201 waited 3434 mS Timeout 10000 [2023-05-23 11:18:18.406] -- Response Code -- 201 waited 3457 mS Timeout 10000 [2023-05-23 11:20:29.507] -- Response Code -- 504 waited 10000 mS Timeout 10000 [2023-05-23 11:22:23.281] -- Response Code -- 201 waited 3446 mS Timeout 10000 [2023-05-23 11:22:27.284] -- Response Code -- 201 waited 397 mS Timeout 10000 [2023-05-23 11:24:18.376] -- Response Code -- 201 waited 3434 mS Timeout 10000 [2023-05-23 11:26:18.364] -- Response Code -- 201 waited 3432 mS Timeout 10000 [2023-05-23 11:28:18.434] -- Response Code -- 201 waited 3457 mS Timeout 10000 [2023-05-23 11:30:29.580] -- Response Code -- 504 waited 10011 mS Timeout 10000 [2023-05-23 11:32:23.303] -- Response Code -- 201 waited 3482 mS Timeout 10000 [2023-05-23 11:32:27.321] -- Response Code -- 201 waited 398 mS Timeout 10000 [2023-05-23 11:34:18.386] -- Response Code -- 201 waited 3420 mS Timeout 10000 [2023-05-23 11:36:18.390] -- Response Code -- 201 waited 3432 mS Timeout 10000 [2023-05-23 11:38:18.394] -- Response Code -- 201 waited 3432 mS Timeout 10000 [2023-05-23 11:40:29.533] -- Response Code -- 504 waited 10001 mS Timeout 10000 [2023-05-23 11:42:23.255] -- Response Code -- 201 waited 3469 mS Timeout 10000 [2023-05-23 11:42:27.320] -- Response Code -- 201 waited 419 mS Timeout 10000 [2023-05-23 11:44:18.485] -- Response Code -- 201 waited 3518 mS Timeout 10000 [2023-05-23 11:46:18.388] -- Response Code -- 201 waited 3420 mS Timeout 10000 [2023-05-23 11:48:18.470] -- Response Code -- 201 waited 3529 mS Timeout 10000 [2023-05-23 11:50:24.821] -- Response Code -- 201 waited 7854 mS Timeout 10000 [2023-05-23 11:52:18.411] -- Response Code -- 201 waited 3459 mS Timeout 10000 [2023-05-23 11:54:18.386] -- Response Code -- 201 waited 3421 mS Timeout 10000 [2023-05-23 11:56:18.453] -- Response Code -- 201 waited 3494 mS Timeout 10000 [2023-05-23 11:58:23.260] -- Response Code -- 201 waited 3469 mS Timeout 10000 [2023-05-23 12:00:21.619] -- Response Code -- 201 waited 6674 mS Timeout 10000 [2023-05-23 12:02:18.391] -- Response Code -- 201 waited 3432 mS Timeout 10000 [2023-05-23 12:04:18.362] -- Response Code -- 201 waited 3408 mS Timeout 10000 [2023-05-23 12:06:23.282] -- Response Code -- 201 waited 3482 mS Timeout 10000 [2023-05-23 12:08:18.359] -- Response Code -- 201 waited 3410 mS Timeout 10000 [2023-05-23 12:10:20.324] -- Response Code -- 201 waited 5372 mS Timeout 10000 [2023-05-23 12:12:18.344] -- Response Code -- 201 waited 3420 mS Timeout 10000 [2023-05-23 12:14:20.386] -- Response Code -- 201 waited 3470 mS Timeout 10000 [2023-05-23 12:16:18.356] -- Response Code -- 201 waited 3433 mS Timeout 10000 [2023-05-23 12:18:18.352] -- Response Code -- 201 waited 3420 mS Timeout 10000 [2023-05-23 12:20:20.046] -- Response Code -- 201 waited 5096 mS Timeout 10000 [2023-05-23 12:22:23.257] -- Response Code -- 201 waited 3471 mS Timeout 10000 [2023-05-23 12:24:18.383] -- Response Code -- 201 waited 3445 mS Timeout 10000 [2023-05-23 12:26:18.358] -- Response Code -- 201 waited 3420 mS Timeout 10000 [2023-05-23 12:28:18.338] -- Response Code -- 201 waited 3420 mS Timeout 10000 [2023-05-23 12:30:29.499] -- Response Code -- 504 waited 10010 mS Timeout 10000 [2023-05-23 12:32:23.236] -- Response Code -- 201 waited 3470 mS Timeout 10000 [2023-05-23 12:32:27.254] -- Response Code -- 201 waited 398 mS Timeout 10000 [2023-05-23 12:34:18.410] -- Response Code -- 201 waited 3529 mS Timeout 10000

neilh10 commented 1 year ago

@ptomasula thanks its still working, seems I've ended up using a non standard end point - or at least it broke for http://monitormywatershed.org but not for http://data.envirodiy.org

Seemed this was a repeat of https://github.com/ODM2/ODM2DataSharingPortal/issues/522#issuecomment-973268492 I'm just wondering is the list of target end points defined anywhere and which is the preferred?

The historical data upload is at a snails pace and even simple POSTs fail with a no response in 10seconds on my test . There is a suggestion for a simple data base efficiency upgrade - the low hanging fruit so to speak - tpwrules : The primary bottleneck with the server in its official incarnation is actually inserting data records into the database due to inefficient use of the ORM and transactions and subsequent timeouts from the lengthy processing. Improving this is pretty simple and results in several times more speed for a single point. https://github.com/ODM2/ODM2DataSharingPortal/issues/649#issuecomment-1561690674

neilh10 commented 1 year ago

@ptomasula its still working and I'll close this issue. The server timeouts are pretty bad and my fields systems are barely managing to upload the data from the months outage - but I'll put that in a separate characterization issue.

Be good to know where the official entrypoints for the server are documented.