bakerstu / openmrn

OpenMRN (Open Model Railroad Network)
BSD 2-Clause "Simplified" License
57 stars 29 forks source link

Create a RELEASE branch #457

Closed TrainzLuvr closed 3 years ago

TrainzLuvr commented 4 years ago

Why?

Because openmrn/master is broken right now. Anyone coming here will clone and not knowingly use it, thinking it is fully working.

Master branch should always be considered "bleeding edge" and potentially broken. Those who wish to use OpenMRN code for their applications and nodes should therefore rely on a stable branch that has been thoroughly tested and is known to work.

What is broken in openmrn/master?

I don't know.

Just because software compiles without errors does not mean the hardware is working using that code. All I can say is that I've expended countless hours, heartaches and tears this year trying to get Balazs's custom Railcom board to work, only to find out that the problem was openmrn/master all along.

Basically, I went back in time (March 2017) for openmrn/master and, further back (December 2016) for openmrn_cue/master. Lo and behold the Railcom board (magically) started to work (also used: Tivaware.2.1.2.111, FreeRTOSv10.1.1, gcc-arm-none-eabi-6-2017-q2-update).

What should be done?

I believe this project is mature, serious, and had been around long enough to have a Release branch. This Release branch does not have to be full of latest features. On the contrary, we need the most stable option where all the hardware works, 100%.

Someone from your dev team, or outside it, should be assigned a role of Release Manager, purely in charge of merging (cherry picking) commits from the Master into Release. This process should involve educated decisions on the impact of each merged commit to the overal code tree, and hardware supported within.

Aside, a full compile instructions should be written for this branch, including exact versions of APIs and cross-compilers used to build a 100% working code.

Community volunteers could be pooled-in to help with testing various hardware options, as obviously there should be no expectation on the devs to own each and every chip/board used here.

Thank you!

atanisoft commented 4 years ago

@TrainzLuvr What exact commits are you using so we can do a git bisect and narrow down which commit(s) broke compatibility.

balazsracz commented 4 years ago

I spent two days trying to do that bisection and it's practically impossible. The problem is that when you check out an older master, it will almost certainly be incompatible with your tools (e.g. new GCC warnings) and the other repositories. These incompatibilities are fixed on master, but if you checkout something before the fix, then you end up having to re-fix it just in order to evaluate one given commit point - then have to keep doing it over and over again as you are searching through the commit history.

The problem is that nothing is guaranteed to work on master unless it gets daily use. So every single application is known to work only at the point that it was last tested in the past. This point however is different for every single application. Therefore it is neither true that openmrn master is broken (the TCS throttle is compiled from very recent master for example), nor is it true that there is one specific "RELEASE" point on master that is the canonical correct point for every application.

There are a bunch of applications that are in the openmrn tree, and I'm sure that many of those are not operational now. But in this specific example the application is actually in a different tree, which makes the problem harder as you need to correctly guess two master history points.

What really happened in this case is that I was the only user of the railcom board until TrainzLuvr wanted to build some for themselves. I stopped using these boards because I started working on a commercial version of the product. At some point I forked away from my repository for the commercial firmware, and my repository was left in an inbetween state. The procedural error on my side was that I never recorded what was the last known good version.

Btw the same thing happened to the command station software that is in my custom repo: it compiles but it does not work, and I don't know what is the last known working state.

Normally in the opensource world dependencies are referred to by specific version number. However, we don't have a versioning or a release process for OpenMRN. There is a version number for OpenMRNLite though, but the release process there is really simple as only manual testing of the three given examples needs to be done. We cannot create a release process for OpenMRN because we do not have release acceptance tests. The unit tests do not cover the behavior of applications, and especially the hardware and board files.

There is a professional development method which can work around this, including the dependencies between different repositories. It's called git submodule. However, the use of this git based dependency management feature is very difficult, especially for newcomers to git. With submodules you can check into the application repository the commit point in openmrn that you want checked out. This way you can zoom back through the history together, and you need to make explicit checkins when you want to roll forward the openmrn master from the viewpoint of the application repository.

Of course this only works if the application repository contains exactly one application, which is really not true for my custom app repository (it has at least five very distinct things). Also not true for openmrn, which has a dozen or more distinct applications.

atanisoft commented 4 years ago

Perhaps it would make sense to implement GitHub Actions which can do the following:

  1. build all code/applications.
  2. test a set of applications using parallel threads (ie: hub and io board running on linux using TCP/IP connectivity and external tool to fire events at the hub)

We would then need a log analyzer for the outputs of the apps to ensure they are operating as expected/designed.

dpharris commented 4 years ago

Balazs and Trainzluvr --

I have one of Balazs's boards, too. And, since I am installing a Loksound sound decoder which include Railcom into a couple of my locos, I would like to use it.

Is the best way forward to try to get it working with the current master, anx then freeze it with a release? (Or should we just purchase the upcoming product?)

David

On Sat., Oct. 31, 2020, 08:53 Balazs Racz, notifications@github.com wrote:

I spent two days trying to do that bisection and it's practically impossible. The problem is that when you check out an older master, it will almost certainly be incompatible with your tools (e.g. new GCC warnings) and the other repositories. These incompatibilities are fixed on master, but if you checkout something before the fix, then you end up having to re-fix it just in order to evaluate one given commit point - then have to keep doing it over and over again as you are searching through the commit history.

The problem is that nothing is guaranteed to work on master unless it gets daily use. So every single application is known to work only at the point that it was last tested in the past. This point however is different for every single application. Therefore it is neither true that openmrn master is broken (the TCS throttle is compiled from very recent master for example), nor is it true that there is one specific "RELEASE" point on master that is the canonical correct point for every application.

There are a bunch of applications that are in the openmrn tree, and I'm sure that many of those are not operational now. But in this specific example the application is actually in a different tree, which makes the problem harder as you need to correctly guess two master history points.

What really happened in this case is that I was the only user of the railcom board until TrainzLuvr wanted to build some for themselves. I stopped using these boards because I started working on a commercial version of the product. At some point I forked away from my repository for the commercial firmware, and my repository was left in an inbetween state. The procedural error on my side was that I never recorded what was the last known good version.

Btw the same thing happened to the command station software that is in my custom repo: it compiles but it does not work, and I don't know what is the last known working state.

Normally in the opensource world dependencies are referred to by specific version number. However, we don't have a versioning or a release process for OpenMRN. There is a version number for OpenMRNLite though, but the release process there is really simple as only manual testing of the three given examples needs to be done. We cannot create a release process for OpenMRN because we do not have release acceptance tests. The unit tests do not cover the behavior of applications, and especially the hardware and board files.

There is a professional development method which can work around this, including the dependencies between different repositories. It's called git submodule. However, the use of this git based dependency management feature is very difficult, especially for newcomers to git. With submodules you can check into the application repository the commit point in openmrn that you want checked out. This way you can zoom back through the history together, and you need to make explicit checkins when you want to roll forward the openmrn master from the viewpoint of the application repository.

Of course this only works if the application repository contains exactly one application, which is really not true for my custom app repository (it has at least five very distinct things). Also not true for openmrn, which has a dozen or more distinct applications.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bakerstu/openmrn/issues/457#issuecomment-719952251, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEDQSRCYYK4ZYKLQ5SSESTSNQXI5ANCNFSM4TFP3H7A .

atanisoft commented 4 years ago

@balazsracz can you send me details on the RailCom board as I'd like to set one up for testing similar to what @TrainzLuvr has done since he is also using my ESP32 CS board.

balazsracz commented 4 years ago

@TrainzLuvr could you support Mike and David with the information they need?

currently the railcom board works on master of openmrn and master of openmrn_cue (i just fixed it).

atanisoft commented 4 years ago

@TrainzLuvr could you support Mike and David with the information they need?

@balazsracz can you or @TrainzLuvr send me the BOM/Gerber files so I can order the necessary parts for testing locally?

balazsracz commented 4 years ago

Perhaps it would make sense to implement GitHub Actions which can do the following:

  1. build all code/applications.
  2. test a set of applications using parallel threads (ie: hub and io board running on linux using TCP/IP connectivity and external tool to fire events at the hub)

We would then need a log analyzer for the outputs of the apps to ensure they are operating as expected/designed.

This is a good idea but will not solve the problem of the OP. We have GTests that evaluate behavior of the code under linux using tcp/ip. We generally try hard to keep the tests passing on head. The problem is we cannot cover the code that runs only in the microcontrollers: freertos drivers, CAN-bus and UART drivers (which are often hardware specific), startup code, interrupt handling, GPIO code, etc. Also not much of any application-level behavior (such as railcom reception).

TrainzLuvr commented 4 years ago

Thanks everyone for your replies.

@balazsracz just to make it clear, I don't want you, or anyone else to think, I was calling them out on something, by writing the above. I wanted to make the most productive and constructive suggestion I could, and I want it known that I'm most grateful for the work everyone puts in here. It's hard to communicate certain things over the internet, and so I felt the need to spell this out. Also, I see you made additional commits to fix the firmware for your Railcom board (going to test them shortly). I was not my intention to force your hand on it, because I know you are very busy with the other CS project. EDIT: Thank you for doing it! :)

@dpharris I'm going to try the current master now and see if that works, although for some reason I have a feeling I have already made some of those changes in my code prior to this, now that I see them. Weird.

Otherwise I can make my package available to you and @atanisoft of what I've done. I did have to basically patch the old master snapshot I used, and correct compilation and other errors. Even with that, I am unable to use any debugging facility such as RailcomToOpenLCBDebugProxy, which GP failures here. Probably due to the old master code missing additional fixes.

Having said all that, the board kinda works with the code I put together - I get occupancy detection events (occ_on and occ_off), and I am able to turn the track on and off. But I am not sure how do I get Railcom Ch1 and Ch2 data, to JMRI for example, I see none of that happening in the JRMI Traffic Monitor.

Also, IIRC what's missing in Balazs's code is getting the bits from the EEPROM storage, and saving the current state of ports when it changes.

TrainzLuvr commented 4 years ago

I pulled the latest openmrn/master and openmrn_cue/master, and compiled the railcom.upload firmware. The Railcom board is working here as well! :)

To make sure that there were no bits set and left from my previous firmware, blanked the Tiva board with LM Flash Programmer, then flashed the railcom.upload firmware again.

One problem that repeated itself with this firmware was that after I uncommented RailcomToOpenLCBDebugProxy, the board crashed upon startup, just the same as it was with my version of firmware.

I used to be able to turn this on without crashing before I got the board working myself, albeit I was not getting any packets in the Traffic Monitor from the Railcom, but at least it was not crashing the board on startup.

Which brings me to another question I wrote previously. How do I get Railcom Ch1 and Ch2 data now? Where should it appear and who/what is responsible for passing it around through the network and to JMRI for example and how does JMRI know it's there, etc?