SensorsIot / IOTstack

Docker stack for getting started on IOT on the Raspberry PI
GNU General Public License v3.0
1.42k stars 303 forks source link

Kernel update may remove `/dev/ttyAMA0` #690

Closed Paraphraser closed 3 months ago

Paraphraser commented 1 year ago

Kernel update may remove /dev/ttyAMA0

This post is intended to be:

  1. An open invitation to contribute to a discussion on how IOTstack should adapt to an anticipated change in the Raspberry Pi OS kernel.
  2. A resource to help anyone fix problems on their own system that may arise when the anticipated change in the Raspberry Pi OS kernel occurs.

Problem summary:

All comments, criticisms and suggestions are welcome!!!!

Background

The background to this issue is two interrelated discussions started by @S474N :

To summarise my understanding:

Historical context

Discussion on the Raspberry Pi forum suggests that the Raspberry Pi's "serial ports" have a slightly chequered family tree. I'm not sure I have this 100% right but I think it covers the basics:

Apparently, none of this is particularly new. We really should have been using serial0 and serial1 all along.

Kernel updates - coming fast

What is new is the notion that ttyAMA0 is about to be withdrawn in favour of serial1.

As of 2023-05-05, anyone keeping up-to-date via routine apt upgrade will probably have kernel 6.1.21-v8+.

The version of the kernel where ttyAMA0 has been withdrawn is 6.1.25-v8+. It was installed by rpi-update so it's in advance of what you get if you stick to apt upgrade. It's also a test version, meaning there is no final decision on whether ttyAMA0 will disappear once this kernel gets to final release.

Please make sure you read the warning before trying rpi-update yourself. A rebuild is in your future if you want to downgrade again.

Revision .21 to .25 is not a big gap so, unless something changes such that /dev/ttyAMA0 is preserved, we (IOTstack) are about to bang into this.

IOTstack service definitions

Table 1 lists all the existing IOTstack master-branch service definitions that contain relevant devices: clauses.

Table 1: Affected Service Definitions
summary

Node-RED

I'll deal with this first because it's likely to be the most immediate problem for the majority of IOTstack users.

The IOTstack service definition for Node-RED assumes flows may need access to the Raspberry Pi's Bluetooth adapter via the node-red-contrib-generic-ble add-on node ("BLE"). The BLE nodes depend on :

devices:
  - "/dev/ttyAMA0:/dev/ttyAMA0"
  - "/dev/vcio:/dev/vcio"
  - "/dev/gpiomem:/dev/gpiomem"

The first line is the subject of this discussion. The other two devices (vcio and gpiomem) are also required for the BLE nodes to work on the Raspberry Pi so it's really an all-or-nothing deal.

See also D-Bus socket.

The most appropriate response to the anticipated kernel change is:

  1. If your flows need access to Bluetooth, replace the first device mapping above with:

    - "/dev/serial1:/dev/serial1"

    Changing the left-hand-side to /dev/serial1 can be done now, ahead of the expected kernel change. It won't break any existing implementations.

    The BLE node is a little strange. You can trigger a scan by giving it the internal device name. The BLE node discovers the Bluetooth MACs of the local adapter and the peer device, and then seems to pretty much forget the internal device name. I've experimented with setting up a BLE node on "ttyAMA0" then changing the service definition to "serial1". Any already-configured BLE node just keeps trucking. Accordingly, I am reasonably sure that changing the right hand side to /dev/serial1 won't actually break any existing flows.

  2. If your flows do not need access to Bluetooth:

    1. Remove the entire devices clause.
    2. Remove the D-Bus socket volume mapping.
    3. Remove the Docker socket volume mapping.

See also Node-RED reference service definition.

deconz

master branch

/dev/null is a placeholder which is replaced during a menu run. The template folder holds the master list:

$ cat ~/IOTstack/.templates/deconz/hardware_list.yml
version: 1
application: "IOTstack"
service: "deconz"
comment: "Deconz hardware check list."
hardwarePaths:
  - "/dev/ttyUSB0"
  - "/dev/ttyACM0"
  - "/dev/ttyAMA0"
  - "/dev/ttyS0"

The menu generates an intermediate list stored in:

$ cat ~/IOTstack/services/deconz/build_settings.yml 
version: '1'
application: IOTstack
service: deconz
comment: Build Settings
hardware:
- "/dev/ttyACM0"
- "/dev/ttyAMA0"
databasePasswordOption: Do nothing

and then the content of the hardware: clause is merged into both of the following:

The result is:

  deconz:
    ...
    devices: # This list is replaced during the build process. Modify the list in "build_settings.yml" to change it.
    - "/dev/ttyACM0"
    - "/dev/ttyAMA0"
    ...

This is a placeholder role so the appropriate treatment is to replace ttyAMA0 in the master list with serial0.

And, yes, that does effectively double-up on /dev/ttyS0 but serial0 is still the recommended device to use so, if anything, /dev/ttyS0 should be removed.

However, that won't deal with any existing implementations unless the user:

  1. Waits until the IOTstack repo is updated on GitHub.
  2. Does a git pull to synchronise the local copy with GitHub.
  3. Re-runs the menu and changes the hardware selections.

It's probably simpler to just hand-edit the three files in situ and replace "ttyAMA0" with "serial0":

old-menu branch

Old-menu branch has three separate service definition files. The most relevant is service_raspbee.yml which will need "ttyAMA0" to be replaced as discussed later at Generalising the placeholder problem.

Home Assistant (container)

The current IOTstack service definition differs quite significantly from the home-assistant.io service definition:

Figure 1: Home Assistant Service Definition – official vs IOTstack
comparison

All the highlighted lines were added by #485. The devices: are the same as for Node-RED so it's a Bluetooth role (confirmed by the PR). The service definition also has the "privileged" flag set. That does a whole lot of (IMO mostly bad) things, including mapping the whole of the host's /dev into the container. In other words, the container can already see all the host's devices. Adding a subset via a devices: clause does nothing.

In terms of the problem at hand, simply removing the devices: clause causes the anticipated kernel update problem to go away. The container still has access to everything it needs because of the privileged flag.

Old-menu branch lacks both the highlighted lines and the privileged flag. It will need to be harmonised.

See also D-Bus socket.

Octoprint

In this case, ttyAMA0 is being used as a placeholder for a serial or USB device. Octoprint inside the container expects the 3D printer to appear on /dev/ttyACM0 so the left hand side of the mapping must be changed to point to the actual 3D printer.

This placeholder will break if /dev/ttyAMA0 goes away.

The external /dev/video0 device will only exist if a camera is connected and defined. If that isn't true, the presence of this mapping will cause docker-compose to complain and refuse to bring up the container. That's why this line (and the associated CAMERA_DEV environment variable) are commented-out in the template.

Zigbee2MQTT

This is another placeholder for a serial device which is going to break if /dev/ttyAMA0 goes away. The left hand side of the mapping needs to be changed to point to the Zigbee adapter.

The wider problem - when it isn't a Raspberry Pi

When IOTstack is installed on something that isn't a Raspberry Pi, the majority of IOTstack device mappings cause problems. That's because the devices listed in the templates are all specific to the Raspberry Pi OS and hardware. Using ttyAMA0 in both Bluetooth and placeholder roles just adds complexity.

The closest IOTstack gets to dealing with this problem is the troubleshooting section of the Wiki which basically says "comment-out the devices".

That's a little unsubtle (I wrote it so I can say that) because it doesn't really deal with the nuances.

There's no real way to automate device discovery. I can't see how the IOTstack menu is ever going to be sophisticated enough to be able to auto-discover answers to questions like:

Absent the menu acquiring sufficient smarts, the only real solution is human involvement to customise compose files after the menu has created the basic scaffolding.

Generalising the placeholder problem

I'm going to use Zigbee2MQTT as my example but the same principles apply to Octoprint and the old-menu version of Deconz.

The current Zigbee2MQTT service definition contains:

devices:
  - /dev/ttyAMA0:/dev/ttyACM0

Interpretation:

It's a placeholder role so the "correct" replacement is:

devices:
  - /dev/serial0:/dev/ttyACM0

However, as discussed earlier:

In other words, serial0 being defined is the exception rather than the rule so, most of the time, docker-compose will refuse to start the container.

That leads to the temptation to perpetuate the both the current solution and the current problem by falling back upon serial1 for no better reason that "it's there":

devices:
  - /dev/serial1:/dev/ttyACM0

Well, it's there on the Raspberry Pi so it will keep docker-compose happy on the Raspberry Pi. But it will still fail on non-Raspberry Pi.

Plus, even though serial1 keeps docker-compose happy on the Raspberry Pi, it still doesn't actually work. A Bluetooth interface isn't a Zigbee adapter, or a 3D printer, or whatever else a container might be expecting. The user still has to provide the actual device the container is expecting.

More to the point, using serial1 as a placeholder masks the problem. The user gets no insights into why the Zigbee container isn't actually working.

Providing the actual device generally boils down to the choice of:

  1. Connecting the physical device to the Pi, seeing how it mounts (eg /dev/ttyACMx or /dev/ttyUSBx), using that device name on the left hand side of a devices: mapping, and accepting that device enumeration may well change the value of "x" thereby causing a mess; or
  2. Using a /dev/serial/by-id/«identity» path for a more robust solution; or
  3. Writing a UDEV rule to assign a human-readable name to the adapter (eg /dev/My3DPrinter or /dev/Zigbee_CC2531) and using that.

What's really needed is a way to prompt the user to figure out the correct device and communicate that to docker-compose. A "right place, right time" approach!

This is the generic syntax I'm proposing for placeholder roles:

devices:
  - "${«CONTAINER»_DEVICE_PATH»:?define your «description»}:/dev/«internalDevice»" 

For Zigbee2MQTT it would be something like this:

devices:
  - "${ZIGBEE2MQTT_DEVICE_PATH:?define your Zigbee Adapter in ~/IOTstack/.env eg ZIGBEE2MQTT_DEVICE_PATH=/dev/ttyACM0}:/dev/ttyACM0" 

On first install, a user will get the following error message:

parsing /home/pi/IOTstack/docker-compose.yml: error while interpolating services.zigbee2mqtt.devices.[]: required variable ZIGBEE2MQTT_DEVICE_PATH is missing a value: define your Zigbee Adapter in ~/IOTstack/.env eg ZIGBEE2MQTT_DEVICE_PATH=/dev/ttyACM0

As error messages go, it could be more succinct but it does point you in the direction of the Zigbee2MQTT service definition.

In my case, I have a UDEV rule which establishes /dev/Zigbee_CC2531 when the adapter is connected so I can meet this requirement with:

$ cd ~/IOTstack
$ echo "ZIGBEE2MQTT_DEVICE_PATH=/dev/Zigbee_CC2531" >>.env
$ docker-compose up -d

You're probably wondering what happens if /dev/Zigbee_CC2531 is defined in .env but is not present at "up" time because I've disconnected the Zigbee adapter?

Error response from daemon: error gathering device information while adding custom device "/dev/Zigbee_CC2531": no such file or directory

Because I did the work of creating a UDEV rule, the error is still pointing in the right direction. The message is less informative if you use by-id or just-as-it-comes from host enumeration (/dev/ttyACM0) but we have to work with what we have.

Generalising the Bluetooth problem

We have exactly two examples:

  1. Home Assistant. It is using the privileged flag so the container already has access to the totality of the host's /dev. There's no need to double-up. The IOTstack service definition should mirror the "official" version from Home Assistant.

    This will work out-of-the-box on all platforms.

    See also D-Bus socket.

  2. Node-RED. I think the best solution is to remove the three devices from the service definition, and then document what is required if Bluetooth needs to be enabled.

    The IOTstack Wiki will need to be rewritten too.

    Re-adding the three devices, by hand, will work out-of-the-box on the Pi.

    For non-Pi, the user still has to figure out how the Bluetooth adapter presents itself and make the appropriate substitution. For example, on macOS that's likely to be:

    devices:
      - "/dev/cu.Bluetooth-Incoming-Port:/dev/serial1"

    If you're wondering, "why not use the privileged flag for Node-RED?" The answer is "because it's far too risky". Think about what /dev actually means. It includes things like:

    • /dev/mem - all the memory on the machine; and
    • /dev/disk- all your attached storage (SD/SSD).

    "Privileged" grants access to every add-on node, regardless of provenance. It's just not safe to bake that into IOTstack by default.

    See also:

    1. D-Bus socket
    2. Docker socket

Related issues

D-Bus socket

The IOTstack service definition for Node-RED contains a volume mapping to the D-Bus socket. This was added, by me, via #70, for no better reason than if the node-red-contrib-generic-ble is included in the Dockerfile, the container goes into a restart loop unless this volume mapping is present.

The IOTstack master branch service definition for Home Assistant contains the same lines but it's not clear whether the author of #485 (@Tzaphkiel) came to the same conclusion (needed to avoid a restart loop) or if it was part of #485 because it was added in #70 (following the leader, as it were).

Googling "DBus site:.home-assistant.io" gets quite a few hits and many of them mention Bluetooth but I haven't been able to find anything recommending this volume mapping. Conversely, this link explains that D-Bus is how the HA Supervisor communicates with the host.

The container version of HA lacks the Supervisor, which is why many people run HA as an appliance, so this volume mapping would seem to be unnecessary. To put it another way, if this mapping was needed either generally or on the Pi, I would expect to find it mentioned in the HA documentation.

Although there's nothing wrong, per se, with a container engaging in inter-process communications via the host's D-Bus, it always bothers me when I see files in a compose file's volume mapping. That's because of what docker-compose does when it is bringing up a container. If the left hand side is missing, docker-compose treats the path as a specification for a directory and does the equivalent of a mkdir -p. Any processes expecting to find a file then get mightily confused and the result is a mess.

In any normal system, D-Bus will always be running by the time docker-compose gets involved so this path will always point to a file. On the other hand, that D-Bus issues page probably wouldn't exist if D-Bus had never ever fouled-up so…

I wasn't really aware of this consideration when I proposed #70 so, on balance, it would be better if this line was removed from the template and the need for it documented for anyone wanting to enable Bluetooth access in Node-RED.

Docker socket

Issue #64 documents my experiments with Bluetooth prior to proposing #70. Part way through I wrote:

Restarting the nodered container to activate BLE produced an error about /var/run/docker.sock so I added [that] mapping.

The current version of node-red-contrib-generic-ble no longer fails when /var/run/docker.sock is missing. That means it is no longer necessary so I'm proposing to remove it.

Node-RED reference service definition

I'm providing these in advance of any Pull Request getting into IOTstack. Node-RED is the thing most likely to fail if ttyAMA0 goes away so good working reference definitions may come in handy.

The Node-RED service definition will be:

nodered:
  container_name: nodered
  build:
    context: ./services/nodered/.
    args:
    - DOCKERHUB_TAG=latest
    - EXTRA_PACKAGES=
  restart: unless-stopped
  user: "0"
  environment:
    - TZ=${TZ:-Etc/UTC}
  ports:
    - "1880:1880"
  volumes:
    - ./volumes/nodered/data:/data
    - ./volumes/nodered/ssh:/root/.ssh

The Node-RED Wiki page will explain that enabling BLE access: