EnviroDIY / ModularSensors

An Arduino library to give environmental sensors a common interface of functions for use with Arduino-framework dataloggers, such as the EnviroDIY Mayfly.
https://envirodiy.github.io/ModularSensors/
Other
79 stars 48 forks source link

EnviroDIYPublisher POST enhancements #454

Open neilh10 opened 11 months ago

neilh10 commented 11 months ago

For EnviroDIYPublisher tpwrules has proposed an enhanced method for uploading to the server, to allow for faster server processing. It results in larger overall packet, by packing readings in to the packet with an enhanced JSON format. By doing this there can be less POSTs, less repetition of the UUIDs and therefore less traffic on the wireless communications channel.

However larger packets on a marginal wireless channel can mean less successful POSTs, so its going to be site dependent as to if this will result in less overall data transferred on the specific wireless channel.

I'm originating this issue as a software best practice's to discuss the options.

There is a server component that explains the enhancement in more detail https://github.com/ODM2/ODM2DataSharingPortal/issues/649 https://github.com/ODM2/ODM2DataSharingPortal/pull/674

Currently the discussion is https://github.com/EnviroDIY/ModularSensors/pull/434 based on https://github.com/tpwrules/ModularSensors/tree/batch-transmission

neilh10 commented 11 months ago

I’m commenting from a reliable delivery of messages, modeled on the the standard Computer Science communications model https://en.wikipedia.org/wiki/OSI_model .

Typically the Data Link Layer implements retransmission to insure successful delivery of packet. I would suggest that for ModularSensors reliable delivery , once a reading is taken, and committed to the .csv file on uSD it should be reliably delivered to the Server. Wireless Radio waves also have their own specific transmission issues – particularly for larger data buffers there is less chance of the data being received, and more retransmission of data.

A practical problem with ModularSensor architecture of allowing multiple Publishers, is that the servers can all be in different states, and a readings queue needs to be established per server. I’m not sure that multiple publishers is actually that useful, but that is the current architecture.

https://github.com/tpwrules/ModularSensors/commit/74cac0df1d6cc69f3dbc9559c662faffd060216a For proposed implementation of ::xxPublisher with a logbuffer based in ram memory, filled every period by a reading action and spoofing a 201 response, it’s breaking the layered model and the meaning of 201. By spoofing a 201, I would expect the implementation to be guarantying the meaning of the 201 - that it will be delivered to the server. For the period that the data is stored in the logbuffer, which for 15minutes sampling if it was 4records is one hour, and 8 records is two hours, If someone walks up to the system and plugs in a USB monitor, if there is a reset watchdog or maintenance action (reset) – the last set of readings stored in the ram buffer are lost. In addition, due to limited RAM, its no saleable to the other xxPublishers.

Readings can be stored in a file on the uSD, which I’ve done in a very similar manner to the logBuffer functionality. This effectively gives a large buffer, though characterizing the real-time effects is on going and the buffer size needs to be effectively limited for reasonable response times. I recently had field items that stored close to 3months of readings at 15minutes ~ 9000+, and in realistic test situation with good cell connection then they where all delivered to the server. https://github.com/ODM2/ODM2DataSharingPortal/issues/673#issuecomment-1714231187. On the original field device, which has a noisy wireless channel uploading readings is taking weeks. However it looks good and if the wireless conditions allow, I expect it will complete the reliable upload https://monitormywatershed.org/sites/TUCA_MW12/

The concept of building the JSON to only have one UUID per transaction is a nice industry standard, and could be implemented reliably by writing the readings to the uSD. The use of ram memory store is a quick fix that is IHMO NOT extensible to the general case of reliable delivery of all readings to the server. The systems I implemented of writing to the uSD, could be adapted by changing the parameter bool useQueDataSource = false; to uint8_t useQueDataSource = 0; //Where 0 represent current transaction type, 1-n would represent using a queued uSD source, and 2-n would be use JSON attempting minimal channel overhead.

This would give flexibility to optimizing the method based on field conditions, either at compile, or by a an adaptive retransmission algorithm based on RSSI and previous failures. https://github.com/ODM2/ODM2DataSharingPortal/issues/485 In the EnviroDIYPublisher implementation, its likely that it will still need a ram buffer, but this is short term (ms) and can be on the stack.

However it does go over wireless radio, and on the edge of the radio signal range there is less chance of them being received. A side effect of larger buffers is it reduces the effective wireless range. With an adaptive transmission scheme, starting with a large JSON buffer (N large) and high failure rate, then N could be reduced until it reaches 1. Though this is likely to be harder to test. Just my two cents 😊

aufdenkampe commented 11 months ago

@neilh10, a quick correction to this comment:

A practical problem with ModularSensor architecture of allowing multiple Publishers, is that the servers can all be in diffent states, and a readings queue needs to be established per server. I’m not sure that multiple publishers is actually that useful, but that is the current architecture. I believe it would save flash program space by removing with a condition compile the unused publishers

The ModularSensors architecture is very efficient, because the compilers only include the code from files that are specified in the include statements, so if a publisher, or sensor, or modem isn't included in a sketch, then the code for that feature isn't compiled. That's the genius of the ModularSensors architecture, and why it was named "Modular".

neilh10 commented 11 months ago

aufdenkampe thanks for the observation, and yes that is the theory ... and C++ is known for code bloat.
I'll edit the wording to remove the sentence (and clean up all that spelling - how did I miss all those foibles!).
I've made a note to myself to experiment at little in https://github.com/neilh10/ModularSensors/issues/138

aufdenkampe commented 11 months ago

@neilh10, it's more than theory, it's how it works. Only files that are included are compiled.

The code bloat that can and often happens with C++ is when a lot of optional functionality is in a single file. @SRGDamia1 has done an excellent job with the Object Oriented Programming (OOP) design of ModularSensors, separating concerns, so that all the optional functionality is separated into different files. She's done continuous refactoring to maintain these strict separations of concerns by abstracting out shared functions into very lean base classes and putting all the specifics into optional source files for every subclass. Her code is exceptionally DRY (Don't Repeat Yourself) and easy-to-read. I don't see any bloat.

tpwrules commented 11 months ago

Point taken on the 201 response, I've fixed that.

As for the reliability, I simply have not seen the behavior you see. The only issue I ran into with the larger requests is that it would crash my modem, but I was able to work around that too. If the timeout is increased, TCP should be able to manage the dropped packets in theory. If a user was concerned about that behavior, they could reduce sendEveryX and MS_LOG_DATA_BUFFER_SIZE to reduce the mean and maximum packet sizes, respectively.

Our operation is severely power constrained (and somewhat cost constrained) and admittedly we optimized for that case. But even then the only time over the past six months we have lost data was during a global outage of our cellular provider for several days. And that data isn't truly lost, it's still stored on the SD card for when maintenance is done. I understand that doesn't quite meet your (or my) definition of reliable, but it's still an aberration.

I would be very happy to see an extension of my work that could buffer data on the SD card too and plan to work a little on that in the future. Yours did not meet our needs at the time which is why we developed our own solution. We also had concerns about the SD card activity and processing further increasing power consumption.

neilh10 commented 11 months ago

Practically speaking if there is a FIFO and the EnviroDIYPublisher::publishData builds the JSON request from that FIFO then it could be implemented as either ram or uSD flash that meets both requirements. If the HTTP response is supplied to the higher layer as to the success of that request, then the higher layer can manage the FIFO, and however many readings where in that POST.

Practically speaking - until its implemented (and tested on the production server) it can't be tested from a Mayfly. https://github.com/ODM2/ODM2DataSharingPortal/issues/649#issuecomment-1730246309 (I'm not clear if you actually have a server instance with your code well tested?)

@tpwrules I would be interested to see hear about your power model. There is a lot of value in characterizing the real world conditions, and then sharing that data for a better understanding of optimization's.

My reference is cell phone, with some of the Mayflys at the limit of the CellPhone range. Actual making a connection can be weather dependent and since its about surface streams, in the riparian area of a stream, season dependent with the growth of vegetation. I'm looking to make the delivery of time series measurements as reliable as "Boot net" - walking up to the system and offloading the data. My power model is a solar collection - with the possibility of storm reducing the solar collection for two weeks. For the people I'm working with, also collecting a uSD periodically to make up for technical short comings of the telemetry isn't something they are likely to do.

A reference model of using battery powering and delivering status (rather than time series readings) would be a severely constrained power model that would need specific optimization's to extend the battery power as long as possible.

Practically speaking, software can be adapted through compile options for different models, so it seems to me both models could be made to work. FYI I documented my approach as working Aug 16, 2020 - https://github.com/EnviroDIY/ModularSensors/issues/194#issuecomment-674598035 so I've been testing it for over three years, and have it in multiple field systems. I restated it July 10 - https://github.com/EnviroDIY/ModularSensors/issues/194#issuecomment-1629331122

The core of what I do is to have the upper layer setup for https://github.com/neilh10/ModularSensors/blob/release1/src/publishers/EnviroDIYPublisher.cpp#L162 then I've adapted it in a distributed FIFO to read from the buffered FIFO https://github.com/neilh10/ModularSensors/blob/release1/src/publishers/EnviroDIYPublisher.cpp#L334 https://github.com/neilh10/ModularSensors/blob/release1/src/publishers/EnviroDIYPublisher.cpp#L352 and it works.

In addition I log each cellphone call and time taken for the response (DBGxxx.log on the uSD), which has been a valuable view of the servers responses.

My measurements in Aug indicate

For discussion/comparison, I would assume a JSON extension with 4 readings takes the same amount of time, and negligible extra modem communication time.

So for 4 POSTs the current systems takes 4 * (26+2.5secs first post) = 114seconds of Cellphone on, power

with my uSD based queue, 4 POSTs per cell phone call that I'm making it gets 26 +4*4 = 38seconds - a big improvement on 114seconds and reliable queuing of all undelivered data, not receiving a 201

with the JSON extension for 4 readings in a POST 26+2.5 = 28.5seconds - also an improvement on 38seconds, plus a 4x through-put improvement for the server.

My suggestion - would be to complete the current integration that just impacts the ModularSensors, and is an optional call for all users to try it out. Now on https://github.com/EnviroDIY/ModularSensors/tree/reliable_delivery Then refactor for a better FIFO API. When/if the server integrates the JSON extension, that then would be the time to have the ModularSensors upgradeable to your instance of a JSON extension.