OpenROV / openrov-software

Meta project for all of the OpenROV Software projects
http://openrov.com
Other
422 stars 188 forks source link

Design for automatically updating MCU firmware #526

Closed BrianAdams closed 8 years ago

BrianAdams commented 8 years ago

This is a proposed design approach for having firmware that is automatically kept in sync with the cockpit software.

Whenever we compile firmware, we hash the files used to generate the firmware and we store that hash in both the firmware and in metadata around the resulting .hex file.

Whenever cockpit starts, it

  1. interrogates the MCU for it's built in hash.
  2. reads the filesystem to get the .hex files hash.
  3. If the hash's don't match, it triggers an upgrade of the firmware

Whenever we push firmware updates, we also add that hash as a tag to the github repo so that we can later search for the firmware source code by hash.

To minimize the time it takes to sync up the firmware, we intentionally separate the compile step from the updating of the MCU firmware. We cache and compiled MCU firmware. We update the release processes to include the matching firmware as precompiled .hex (or other appropriate formats) with the cockpit software on the image so that the update of the firmware is as fast as possible.

Whenever the firmware is being updated, it triggers an event on the bus that announces that firmware status has changed to upgrading, and a separate message that shows the state of the firmware upgrade such as % uploaded or % complete.

We intentionally do not use the hash concept for compatibility checking of the cockpit to the firmware. Instead we use the capabilities bit mask to make sure we use messages that are compatible with the firmware. This allows others to create firmware and upload usable firmware that cockpit just accepts.

BrianAdams commented 8 years ago

As a side benefit. We could choose to package the edit/compile environment as an add-on for the image instead of core functionality since the firmware files will be distributes in compiled form.

BrianAdams commented 8 years ago

Charles Cross @spiderkeys I think one of the easiest ways to handle it is to use the eeprom to store details about what "mcu core" is installed. Whether we do anything fancy with hashing files and whatnot (which I'm not sure really buys us much), I think we can most easily control versioning by just using the commit hash of the firmware core or a version number file in the repo. Any user code should be done solely in plugins, and firmware plugins should have a config file which specifies compatibility. I haven't found any good way of guaranteeing support once they want to modify the core or the build process outside of a plugin. I think at that point, they basically need to check a box in cockpit saying they are going rogue but it really doesn't matter where it is stored, eeprom, git repo, version file, as long as there is a method for detecting it during update There are two concepts of what firmware is "installed", once being what is actually flashed and the other being what source is in the firmware folder. Since we generally rebuild firmware, you don't need to know the flashed version when building new firmwar eor updating firmware plugins, but you might need it if you do a javascript plugin update

badevguru @BrianAdams 13:53 Well as long as your not strongly against using the hashing, I’m convinced that it is more full proof than the other options. If we can store that hash in the eprom as we have been doing with the Arduino it will always be up to date (vs in a secondary location like external eeprom).

Charles Cross @spiderkeys 13:53 I'm cool with the hashing, but only if it is hashing the "core"

badevguru @BrianAdams 13:54 I’m aguring for hashing the entire fileset (including plugins) that actually go in to the firmware that is being run. Why limit it to just the “core” files we committed?

Charles Cross @spiderkeys 13:54 i dont think there is any value in hashing plugins or automatically generated code. what do you gain from it? if a user "disables" the external lights plugin, the hash changes, but they still have the latest version of everything

badevguru @BrianAdams 13:55 You get to know exactly what is running on the MCU. It someone enables a plugin, that will change the hash… which will let us know the firmware on the MCU is now out of sync with the latest version of the compiled firmware.

Charles Cross @spiderkeys 13:58 The only way you can know exactly what is running on the MCU is if you build a map of every single possible combination of hashed builds with plugins enabled/disabled. The hash doesn't give you any additional information other than to know "something has changed". But we have inherently designed it to allow people to enable and disable code blocks (plugins) in cockpit. Those plugins being enabled or disabled affects the hash, but the changed hash does not reflect any change in the source code, only the compiled result. You could hash each individual plugin and check for updates for each individually, the same way you would hash the core, but hashing everything combined only hides information

badevguru @BrianAdams 14:01 I see what your point is. In that case I’d be for hashing both sets of files. Hash 1 so that you can idenfity which core set you are using, which is also mapped to hash tags we store in github so that you can cross reference them, and a second has which is the state of all of the functionality running on the MCU so that you can know if the configured running code in the MCU matches that of the compiled firmware on the filesystem.

Charles Cross @spiderkeys 14:01 I do think that the hashing could be useful for updating each individual component, though. = I would lean towards leaving all code blocks as git repos, that way you have a built in system for updating, hashing, versioning, etc, but the weakness there lies in storage use. RPI-update uses that methodology, but there isn't a concept of people updating any code Thinking about it from a user going through an update, you have two issues. 1. They may not want to update the flashed firmware 2. They may not want their local code to be modified, or at least if it is, they want to keep any customizations that have been made (which is where branches could be useful)

Charles Cross @spiderkeys 14:07 git kind of gives you built in support for updating their official source while keeping their changes stashed or in a user branch

badevguru @BrianAdams 14:09 Sure. I think the part we are not taking on, is the ability to interrogate the MCU to find out what github commit version of a plugin happens to be loaded on the MCU. I’m okay with that.

Charles Cross @spiderkeys 14:11 Oh, I think that can easily be done in the build/flash step. If we just grab the plugin name + commit hash and shove it in a json file in /opt/openrov/system/config/staged-versions.json and then copy that to /opt/openrov/system/config/flashed-versions.json after a successful flash, we can do those checks er, actually if you want it to travel with MCU, then put it on the controllerboard eeprom in that final step One thing about using the git methodology for plugins, however, is that plugins have to be broken out into their own repos

badevguru @BrianAdams 14:13 Not a big fan of trying to keep the eeprom on the controllerboard in sync with the MCU firmware since their is not transactional way to ensure they stay in sync… but if we think that is important data to have, I agree that is the best place for the .json file to live. That is one of the issues, also, developers can change the files without committing to github on the local device at which point the hash cannot be mapped to anything. But I think those are all secondary to the idea of going ahead with the two hashes that would be embedded in the MCU firmware itself, 1 for core items, and 1 for core items + plugins.

Charles Cross @spiderkeys 14:17 Still trying to understand why you would like to hash the core + plugins combined. What are you checking that against?

badevguru @BrianAdams 14:18 So that I can simply know, does the compiled firmware sitting on my disk match the running firmware on the MCU. This lets me know if I should update the firmware or not.

Charles Cross @spiderkeys 14:20 Ok, I see what you're getting at. In that case, I think the easiest thing to do is make that a query command in the firmware, like you suggested.

badevguru @BrianAdams 14:21

Charles Cross @spiderkeys 14:21 If a user clicks enable on external lights now that they've ordered one, would you fire off a check then? i suppose that would be the most prudent way to go about it, in case you change sd cards, swap controllerboards, etc explicit user actions aside, when should the system automatically check?

badevguru @BrianAdams 14:22 So click enable, system recompiles the entire firmware, system sees a new firmware is available, system then uploads new firmware. Use gets message that lights now available. I think we can setup the system up a couple ways to check the firmware. The most obvious would be to put a filesytem monitor on that folder which would automatically trigger if the underlying filesystem is changed. We also check on the initial boot of the system.

Charles Cross @spiderkeys 14:24 only issue with triggering off of changes is when someone is modifying the code in cloud9

badevguru @BrianAdams 14:25 I’m suggesting monitoring the cache location of stored hex/bin files. And that pushing the compiled firmware there is a seperate step.

Charles Cross @spiderkeys 14:26 Right, I was thinking about the step before that where the system checks to see if you need to build new firmware because the source has changed or were you not suggesting that

badevguru @BrianAdams 14:27 Not suggesting that.

Charles Cross @spiderkeys 14:27 What triggers a firmware build then possibilities: automatic checking to see if source has changed (calculate new hashes and compare them to the one on the mcu) User triggered System update triggered Triggered by enabling/disabling a plugin

badevguru @BrianAdams 14:28 So when we build images, we would seed the cache with the precompiled bin files. That would cause our process to automatically load the most up to date version. As far as builds go, the user would need to trigger a build. Yes, use trigger = enable/disable plugin.

Charles Cross @spiderkeys 14:30 Or an explicit build, I would imagine. If they wanted to update an already enabled plugin

badevguru @BrianAdams 14:30 Yep. I could see pushing the option for “deploy firmware to MCU” to cloud9 and a script on the filesystem. We would not need it for the normal cockpit UI since for those users it is now automatically keeping the firmware on the MCU in sync with the firmware compiled and sitting in the cache. What the UI would need is event indicators that the MCU is being deployed to.

Charles Cross @spiderkeys 14:32 well we can't keep the firmware up to date as hex files, only a core system i.e., no plugins enabled

badevguru @BrianAdams 14:32 Are you referring to when we deploy new firmware as part of a system update?

Charles Cross @spiderkeys 14:35 yea, I'm getting back to the issue of updating the firmware source code. Presumably, they would have the official firmware "branch" (master) and potentially a user branch where theyve made changes, and we want to update their master branch, but build their user plugins I think an example might help they have core 1.0, pluginA 1.0, but theyve modified pluginA to be 1.1(user). We release an update to pluginA that bumps it to 1.1(master), but that is still different from their code. we want to update their master source, but still keep them on their 1.1(user) and use that to build the firmware

BrianAdams commented 8 years ago

Regarding the example of a user upgrading a plugin.

I am assuming the user in manually applying updates to a plugin in the plugin folder etc/community-plugins/myplugin/firmware or src/plugins/myplugin/firmware if it is a plugin we distribute with cockpit.

If they are applying changes to /opt/openrov/cockpit/src/plugins... and apply an update from us, we will overwrite those updates.

If they are working on a third party plugin that was downloaded (or that we deployed as a separate plugin that we pre-packaged on the image) then our update will have no affect, but there updates will be overwritten when they update the third party plugin.

spiderkeys commented 8 years ago

Summarizing as a Q/A list to identify gaps:

How does a software update occur?

What if the user has modified the source code on their device?

What if there are hard dependencies or incompatibilities between the packages?

What happens after you update the source code?

How does the firmware get flashed?

How can you tell what version of software is on the MCU?

What if it doesn't respond?

Can the user manually kick off new builds?

If there is no custom code, what is the update process?

What if the flash fails?

BrianAdams commented 8 years ago

How does a software update occur?

When the user connects to the internet, it checks with to see if the version identifier for each package matches the latest in the service. If not, it updates the fi

In my mind I am envisioning updates as handled by a docker like container solution. So the plugins we distribute within that container are handled by simply updating the container. Changes that are made within the running container destroyed when the container updates.

BrianAdams commented 8 years ago

What if there are hard dependencies or incompatibilities between the packages? .... If the user is using custom code in one plugin, it is presumable that we can not safely update their core package or any of the other plugins. We need a way to handle this (split all packages off into a user branch so they can update their officials while maintaining a good state?) We currently have no way to resolve conflicts between packages (i.e. conflicting i2c addresses, pin usage, etc)

We can simply have the user have to clone one our plugin folders to start making changes if they want those changes to survive updates. Because it has been cloned, it should come in as a third-party library. We should update our code so that if the same plugin exists in the third party library and has the SAME version number of our plugin, we use the third party plugin instead of our included plugin. In that way, if the user does update, the third-party plugin they are working on has to be manually updated to show it should still override the original. If the third-party plugin has conflicts and has been marked to override our original plugin, that is not on us and only breaks that user of the third party plugin.

BrianAdams commented 8 years ago

What happens after you update the source code?

Nothing should happen by default. Something should trigger a compile. If that succeeds, then something would need to promote that compiled version of firmware as the new version in the cache. Promoting something to the cache would let the system take over in making sure it gets deployed to the MCU.

BrianAdams commented 8 years ago

What if the flash fails?

Presumably we have built in retries for timing issues.

If it still failes we have bigger problems. That should imply a defective unit as the MCU is refusing to load a compiled firmware file. We need to be able to report that the upload of firmware to MCU failed and to contact support.

BrianAdams commented 8 years ago

Approved for implementation