EnviroDIY / ModularSensors

An Arduino library to give environmental sensors a common interface of functions for use with Arduino-framework dataloggers, such as the EnviroDIY Mayfly.
https://envirodiy.github.io/ModularSensors/
Other
81 stars 48 forks source link

General cleanup and refactoring in preparation for batch transmission #434

Closed tpwrules closed 1 year ago

tpwrules commented 1 year ago

My group has been working on a project to enhance ModularSensors with batch data transmission. We have successfully implemented it, it's been running for over a month, and we wish to contribute it back.

This PR contains some preliminary refactorings which make our project possible. Let's get these in first to minimize the set of changes to be reviewed. Please see individual commit messages for details.

aufdenkampe commented 1 year ago

@tpwrules, thanks for your Pull Request with your contributions toward batch data transmission! I've done a quick look at your commits and really appreciate their level of granularity and documentation describing why you made the changes.

The only concern I had was with commit 1498d3b2dee4fe33b9157ec0838b56f8bfccad72, which removed some print functions that we use for some debugging.

@SRGDamia1, will you review this?

tpwrules commented 1 year ago

Thank you for the vote of confidence.

The only concern I had was with commit https://github.com/EnviroDIY/ModularSensors/commit/1498d3b2dee4fe33b9157ec0838b56f8bfccad72, which removed some print functions that we use for some debugging.

I did some digging through the commit history and it looked to me like these were remnants of an earlier design of the data publisher stuff. I had all my debugging needs served by the printout of the request to the debug serial port, which I preserved.

Do you actively use them for debugging today? If so, then I think they would be better served with a different structure anyway. Having multiple functions to do the same thing is hard to maintain and unreliable to debug with.

aufdenkampe commented 1 year ago

@tpwrules, that great to hear that commit 1498d3b2dee4fe33b9157ec0838b56f8bfccad72 only removed redundant functions that we aren't using for the serial monitor outputs. I hadn't looked closely enough in my quick review!

I'll try to test it out in the next week or so. In the meanwhile, we'll want @SRGDamia1 to look it over, as she's the main developer behind ModularSensors.

neilh10 commented 1 year ago

Seems some reasonable refactoring. :) Some comments

aufdenkampe commented 1 year ago

@tpwrules, as @neilh10 alluded, we're still using a GitFlow workflow, where the main branch reflects the latest release, and we're mostly working in the develop branch. Yesterday I reoriented your PR toward the develop branch, then initiated the automated testing that @SRGDamia1 set up with GitHub Actions. Most of those are failing. Can you look through those errors to see if you can fix them?

@neilh10, indeed your similar PR has languished for a while, mostly because of lack of funding for @SRGDamia1's time on this work. The good news is that Stroud & LimnoTech have a promising proposal pending right now that is likely to reinvigorate our development efforts again!

tpwrules commented 1 year ago

@neilh10 Batch uploading is simply storing data in RAM and transmitting multiple data points per request. The overhead in power consumption, HTTP protocol, UUID transmission, etc. is extreme. I had seen your work and while it improves reliability, it does not improve on the data cost or power consumption aspects.

I don't have the code public yet mostly for flexibility in my working style. It takes additional time to extract changes and prepare to submit them to you. I will look into publishing the other parts soon and writing up some more detailed motivation.

@aufdenkampe I looked at the tests and it looks like they are broken for PRs from a fork. They always assume the branch is in EnviroDIY's repo. I don't know that there's a way to fix this, I don't know much about GitHub Actions.

neilh10 commented 1 year ago

@aufdenkampe thanks. Fantastic to hear porposal in works for Stroud/LimnoTeach. Wow. I'm bit surprised the automated build in GitHub Actions can be called "testing" - surely "system testing" is end to end, from sensor reading to reliably delivery and viewing on MMW. Of course there are many versions of software verification testing. Maybe I'm being too pedantic as I'm mentoring engineering students :) or maybe I'm not understanding what is happening in the github actions. My version of automated testing (I used to do a lot of it for paid work) - https://github.com/EnviroDIY/ModularSensors/issues/435

@tpwrules thanks for the feature description. I do think its good open workflow to add an issue to track a new feature/goal. Just my 2cents. Thanks for the good individual description on each commit.

Great to see a discussion on power usage. My analysis of power overhead, which is greatly improved with the LTE CAT-M1 modems over previous generation, is the slow and often failed/timeouts of response from MMW.
Be happy to model this as coloumbs/mAsecs in a spreadsheet if its of interest. Of course failures in one area like MMW, is a great opportunity for testing - and my fork/systems just been through a rigorous 4 weeks of testing of reliable delivery from the IoT node - couldn't want for a better test bed of a soggy MMW for testing the end node :) Described here https://github.com/ODM2/ODM2DataSharingPortal/issues/641

tpwrules commented 1 year ago

This is the issue describing the new protocol: https://github.com/ODM2/ODM2DataSharingPortal/issues/649

SRGDamia1 commented 1 year ago

I'm sorry I haven't had a chance to dig into this. But I think I've fixed the GitHub action to compile pull requests, so if you push an empty commit, it will re-run checks for you based on the latest action.

tpwrules commented 1 year ago

I rebased on top of the current develop branch. Please approve running of the workflows.

tpwrules commented 1 year ago

Reminder to approve running of the workflows.

tpwrules commented 1 year ago

I think the workflows might still be broken, it says that 0 ran instead of saying that they failed.

SRGDamia1 commented 1 year ago

Do you have the code for the next step of creating the batch upload formatted data available?

SRGDamia1 commented 1 year ago

I've been slowly working through @neilh10's methods of queuing data onto an SD card to re-attempt failed posts, but I suspect his method wouldn't be compatible with creating your suggested batch post format here. I would like to see how you're queueing and batching the points, if you have it figured out somewhere.

tpwrules commented 1 year ago

The changes in this PR should not conflict with his methods at all. I also don't see how @neilh10's work would be incompatible with my suggested batch post format. It would make his method better if he could modify his code to take advantage of it but it will work without changes as the format is designed to be backwards compatible. Though of course there would be a compatibility question with the batch transmission on the ModularSensors side.

I've pushed our current production code to this branch for your review. It has been running for months now like a dream. Please see the commit messages for details. Some are contained in this PR but some are not. The big ones are here and here.

There are a number of other not-batch-related changes too you might like. But my plan was to get this PR in, then consider the best method for batch transmission with your team and if so send a follow up PR, possibly with separate ones for the other changes. These other changes should not conflict in spirit with @neilh10's work though there may be some code conflicts to resolve. There are advantages and disadvantages to each approach to batch transmission and maybe you, I, and him can discuss those somewhere before subsequent PRs.

But none of that should prevent this PR from being merged.

tpwrules commented 1 year ago

Thanks for merging this and looking through the other commits but they are not ready for a proper review. I should have said they are ready for reference. Please wait until I submit a PR to comment.

Maybe there is a forum for real-time discussion we could talk in?

SRGDamia1 commented 1 year ago

I don't have Slack or anything similar. Would you like to set up a zoom call?

tpwrules commented 1 year ago

Yes, that would be great! If you could send an invite to my GitHub contributor e-mail that would be good. I am available for the next few hours today and any time after 2pm central time tomorrow.

neilh10 commented 1 year ago

There is "Discussions" that can be enabled per repo. IHMO much better than slack. :) As I understand this "batch transmission" is bufferin the messages in ram memory. My onus is on reliability of all transmissions - as reliable as the boot net, (that is walking up to it and reading off values ) A recent example case for me, was where the transmission went done for some months and then I replaced the parts last week and its now working with transmitting all outstanding readings.
To me the only reliable method for delayed transmission is buffering of the readings to the uSD. Seems to me the @tpwrules method would be layered between getting multiple readings from the uSD for the transaction of generating a JSON batch method. Practically speaking, this creates a larger packet - and for the edge of cell radio zones with marginal reception (like https://monitormywatershed.org/sites/TUCA_GV01/), a larger messages is more unlikely to succeed. In my recent measurements the biggest power (watts) user is the time take to access the cellphone and setup tcp/ip link ~ 30seconds.
I've also documented a recent test time for uploading 9000 messages - https://github.com/ODM2/ODM2DataSharingPortal/issues/673#issuecomment-1714231187 with the server taking a lot of batches of "100 messages" in one cellphone call with no failures. Very nice to see