canonical / iot-example-graphical-snap

Developer Guide for Embedding IoT GUI with Ubuntu Frame
MIT License
22 stars 11 forks source link

Flutter app deamon crashes on ubuntu core with inotifywait error #30

Closed KirioXX closed 5 months ago

KirioXX commented 6 months ago

Hi, Thanks for this repository it helped a lot to setup our flutter app snap.

We setup some test devices with Ubuntu Core and our Flutter app and noticed that they after a couple days crash.

In a local env I only seen it a couple of times but I was able to get the logs when one of my devices crashed:

2024-03-14T18:51:28Z systemd[1]: snap.twa.daemon.service: Main process exited, code=exited, status=1/FAILURE
2024-03-14T18:51:28Z systemd[1]: snap.twa.daemon.service: Failed with result 'exit-code'.
2024-03-14T18:51:31Z systemd[1]: snap.twa.daemon.service: Scheduled restart job, restart counter is at 2.
2024-03-14T18:51:31Z systemd[1]: Stopped Service for snap application twa.daemon.
2024-03-14T18:51:31Z systemd[1]: Started Service for snap application twa.daemon.
2024-03-14T18:51:31Z twa.daemon[9035]: ERROR: inotifywait could not be found, mir-kiosk-snap-launch expects:
2024-03-14T18:51:31Z trunk-works-andon.daemon[9035]:  . . :     stage-packages:
2024-03-14T18:51:31Z trunk-works-andon.daemon[9035]:  . . :        - inotify-tools
2024-03-14T18:51:31Z systemd[1]: snap.twa.daemon.service: Main process exited, code=exited, status=1/FAILURE
2024-03-14T18:51:31Z systemd[1]: snap.twa.daemon.service: Failed with result 'exit-code'.

Is there anything in our configuration or our device setup that could cause this behaviour? Please let me know what else would help to debug this issue. Thanks!

Saviq commented 6 months ago

Hi @KirioXX, as logged you should add this to your snap:

stage-packages:
  - inotify-tools

To confirm the tool is there: snap run --shell twa.daemon -c 'which inotifytools'.

That won't be the reason for your app to crash, though - it wouldn't start in the first place:

https://github.com/canonical/iot-example-graphical-snap/blob/dcf41bf23cbf1c9730c52c121261f4a557b4fca1/wayland-launch/bin/wayland-launch#L11-L17

KirioXX commented 6 months ago

Thank you for your answer @Saviq . We actually have this:

stage-packages:
  - inotify-tools

part in our configuration.

This is our config:

 name: twa
version: 1.12.0
summary: custom ubuntu core snap
description: >
  Custom ubuntu core snap
confinement: strict
compression: lzo
grade: stable
base: core22

apps:
  twa:
    command-chain: &ref_0
      - bin/graphics-core22-wrapper
      - bin/wayland-launch
    command: &ref_1 bin/twa
    plugs: &ref_2
      - opengl
      - wayland
      - network
      - network-observe
      - network-bind
      - network-status
      - network-control
      - network-manager
      - network-manager-observe
      - netlink-audit
      - netlink-connector
      - qualcomm-ipc-router
      - network-setup-observe
      - bluetooth-control
      - avahi-observe
    environment: &ref_3
      XDG_DATA_HOME: $SNAP_USER_DATA
      XDG_DATA_DIRS: $SNAP/usr/share
      GDK_GL: gles

  daemon:
    daemon: simple
    restart-delay: 3s
    restart-condition: always
    command-chain: *ref_0
    command: *ref_1
    plugs: *ref_2
    environment: *ref_3

plugs:
  graphics-core22:
    interface: content
    target: $SNAP/graphics
    default-provider: mesa-core22
environment:
  XDG_CACHE_HOME: $SNAP_USER_COMMON/.cache
  XDG_CONFIG_HOME: $SNAP_USER_DATA/.config
  XDG_CONFIG_DIRS: $SNAP/etc/xdg
  XDG_DATA_DIRS: $SNAP/usr/local/share:$SNAP/usr/share
  XKB_CONFIG_ROOT: $SNAP/usr/share/X11/xkb
layout:
  /usr/share/libdrm:
    bind: $SNAP/graphics/libdrm
  /usr/share/drirc.d:
    symlink: $SNAP/graphics/drirc.d
  /usr/local/share/fonts:
    bind: $SNAP/usr/local/share/fonts
  /usr/share/fonts:
    bind: $SNAP/usr/share/fonts
  /usr/share/icons:
    bind: $SNAP/usr/share/icons
  /usr/share/sounds:
    bind: $SNAP/usr/share/sounds
  /etc/fonts:
    bind: $SNAP/etc/fonts
  /usr/lib/$CRAFT_ARCH_TRIPLET/gdk-pixbuf-2.0:
    bind: $SNAP/usr/lib/$CRAFT_ARCH_TRIPLET/gdk-pixbuf-2.0
  /usr/lib/${CRAFT_ARCH_TRIPLET}/gtk-3.0:
    bind: $SNAP/usr/lib/${CRAFT_ARCH_TRIPLET}/gtk-3.0
  /usr/share/mime:
    bind: $SNAP/usr/share/mime
  /etc/gtk-3.0:
    bind: $SNAP/etc/gtk-3.0

parts:
  gsettings+pixbuf+immodules:
    plugin: nil
    build-packages:
      - libgdk-pixbuf2.0-0
      - librsvg2-common
      - shared-mime-info
      - libgtk-3-0
    override-build: >
      craftctl default
      # Update mime database
      update-mime-database ${CRAFT_PART_INSTALL}/usr/share/mime
      # build immodules cache
      mkdir -p ${CRAFT_PART_INSTALL}/usr/lib/${CRAFT_ARCH_TRIPLET}/gtk-3.0/3.0.0/
      /usr/lib/${CRAFT_ARCH_TRIPLET}/libgtk-3-0/gtk-query-immodules-3.0 > ${CRAFT_PART_INSTALL}/usr/lib/${CRAFT_ARCH_TRIPLET}/gtk-3.0/3.0.0/immodules.cache
    stage-packages:
      - librsvg2-common
      - gsettings-desktop-schemas
      - libglib2.0-bin
    override-prime: >
      craftctl default
      # Compile the gsettings schemas
      /usr/lib/${CRAFT_ARCH_TRIPLET}/glib-2.0/glib-compile-schemas "$CRAFT_PRIME/usr/share/glib-2.0/schemas"
      # Index the pixbuf loaders
      LOADERS_PATH=$(echo ${CRAFT_PRIME}/usr/lib/${CRAFT_ARCH_TRIPLET}/gdk-pixbuf-2.0/*/loaders)
      QUERY_LOADERS=/usr/lib/${CRAFT_ARCH_TRIPLET}/gdk-pixbuf-2.0/gdk-pixbuf-query-loaders
      GDK_PIXBUF_MODULEDIR=${LOADERS_PATH} ${QUERY_LOADERS} > ${LOADERS_PATH}/../loaders.cache
      sed -i 's!'"$CRAFT_PRIME"'!!g' "${LOADERS_PATH}/../loaders.cache"

  twa:
    plugin: nil
    source: .
    build-snaps:
      - flutter/latest/stable
    build-environment:
      - C_INCLUDE_PATH: /snap/flutter/current/usr/include
      - LD_LIBRARY_PATH: >-
          ${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}/snap/flutter/current/usr/lib/$CRAFT_ARCH_TRIPLET
      - PKG_CONFIG_PATH: >-
          ${PKG_CONFIG_PATH:+$PKG_CONFIG_PATH:}/snap/flutter/current/usr/lib/$CRAFT_ARCH_TRIPLET/pkgconfig
      - XDG_DATA_DIRS: /snap/flutter/current/usr/share${XDG_DATA_DIRS:+:$XDG_DATA_DIRS}
    override-build: >
      set -eux
      echo "Installing flutter"
      mkdir -p $CRAFT_PART_INSTALL/bin/lib
      flutter channel stable
      flutter upgrade
      flutter config --enable-linux-desktop
      flutter doctor
      flutter pub get
      echo "Removing the build folder"
      rm -Rf build/
      echo "Start building the app"
      flutter build linux --release -v
      echo "Copying the build to the part install folder"
      cp -r build/linux/*/release/bundle/* $CRAFT_PART_INSTALL/bin/
    stage-packages:
      - libgtk-3-0
      - libgl1

  setup:
    plugin: dump
    source: snap/local/wayland-launch
    override-build: >
      # The plugs needed to run Wayland. (wayland-launch checks them, setup.sh connects them)
      # You may add further plugs here if you want these options
      PLUGS="opengl wayland graphics-core22"
      sed --in-place "s/%PLUGS%/$PLUGS/g" $CRAFT_PART_BUILD/bin/wayland-launch
      sed --in-place "s/%PLUGS%/$PLUGS/g" $CRAFT_PART_BUILD/bin/setup.sh
      craftctl default
    stage-packages:
      - inotify-tools

  graphics-core22:
    after:
      - twa
      - gsettings+pixbuf+immodules
      - setup
    source: https://github.com/MirServer/graphics-core22.git
    plugin: dump
    override-prime: |
      craftctl default
      ${CRAFT_PART_SRC}/bin/graphics-core22-cleanup mesa-core22 nvidia-core22
      cd "$CRAFT_PRIME/usr/share/"
      rm -rf bug drirc.d glvnd libdrm lintian man
      rm -rf applications apport bash-completion dbus-1 doc-base doc gtk-doc\
             help pkgconfig libthai metainfo themes thumbnailers xml
    prime:
      - bin/graphics-core22-wrapper

architectures:
  - build-on: amd64
  - build-on: arm64

What I noticed we don't have the hooks because they caused some trouble when we tried to install the snap on Ubuntu Core. Could it be the reason that when it maybe tries to refresh the snap that it crashes because the hook is missing?

Saviq commented 6 months ago

What I noticed we don't have the hooks because they caused some trouble when we tried to install the snap on Ubuntu Core. Could it be the reason that when it maybe tries to refresh the snap that it crashes because the hook is missing?

No, not really. The hooks only deal with auto-disabling the daemon outside of Ubuntu Core. It would be interesting what trouble you've seen with them :)

Can you confirm these look correct?

$ snap run --shell twa.daemon -c 'printenv PATH'
/snap/twa/<revision>/usr/sbin:/snap/twa/<revision>/usr/bin:/snap/twa/<revision>/sbin:/snap/twa/<revision>/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games

$ snap run --shell twa.daemon -c 'which inotifywait'
/snap/twa/<revision>/usr/bin/inotifywait

Adding set -x and env in the mir-kiosk-snap-launch script could help point out what's happening there - maybe it's missing $SNAP/usr/bin in PATH somehow…

KirioXX commented 6 months ago

No, not really. The hooks only deal with auto-disabling the daemon outside of Ubuntu Core. It would be interesting what trouble you've seen with them :)

The troubles I had was that when I tried to install the snap the post-refresh could not access the install script. I then copied all the code from the install script into the post-refresh and that failed too but I can't remember what the error was.

I ran the command and for the first command I get this output:

/snap/twa/x1/usr/sbin:/snap/twa/x1/usr/bin:/snap/twa/x1/sbin:/snap/twa/x1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games

Should the iontifywait be included in this path?

For the second command I get the right path:

/snap/twa/x1/usr/bin/inotifywait
Saviq commented 6 months ago

These look fine (/snap/twa/x1/usr/bin is in there, so inotifywait is found).

Adding set -x and env in the mir-kiosk-snap-launch script could help point out what's happening there - maybe it's missing $SNAP/usr/bin in PATH somehow…

That'd be the next step, analyzing how is inotifywait missing there.

Any case, this is a restart problem, you said the app was running and crashed - you'll need to look for logs of the original crash.

KirioXX commented 6 months ago

Thanks Saviq for checking.

To give you a bit more context, our app is running on raspberry pi's with Ubuntu Core in a factory environment. They are not turned off and are running constant in kiosk mode. I think you are right that is has to do with the restart, but I would have expected that it only restarts the app when there is a new version what has not been the case in a while and we still have seen crashes.

I'll try to find out how many crashes we had over the last week. Is there a way to retry the restart when it fails on the first try?

Saviq commented 6 months ago

It does indeed try to restart in a loop. But if the issue persists, it won't ever succeed.

That's these bits (documentation):

    restart-delay: 3s
    restart-condition: always

You'll need to modify the wayland-launch script to see what's going wrong in there. Add set -x to see the commands ran, env to see the environment. Maybe passing the full $SNAP/usr/bin/inotifywait would help - though that would suggest a problem with snapd. You can just kill your app to replicate the restart behaviour to investigate.

wayland-launch is just a helper script that you could do away with altogether, the only real "action" it takes is this:

https://github.com/canonical/iot-example-graphical-snap/blob/dcf41bf23cbf1c9730c52c121261f4a557b4fca1/wayland-launch/bin/wayland-launch#L43

The rest is just validating the environment.

I'll try building a version of your snap locally to see if I can reproduce.

KirioXX commented 5 months ago

Thank you for looking into it Saviq I highly appreciate it.

I added the flags to my wailand-launch config and got this response:

2024-03-20T15:47:23Z twa.daemon[3672]: + dirname /run/user/0/wayland-0
2024-03-20T15:47:23Z twa.daemon[3450]: + [ -O /run/user/0/wayland-0 ]
2024-03-20T15:47:23Z twa.daemon[3450]: + mkdir -p /run/user/0/snap.twa -m 700
2024-03-20T15:47:23Z twa.daemon[3450]: + unset DISPLAY
2024-03-20T15:47:23Z twa.daemon[3450]: + exec /snap/twa/x1/bin/andon
2024-03-20T15:47:24Z twa.daemon[3450]: libEGL warning: wayland-egl: could not open /dev/dri/card0 (Operation not permitted)

It doesn't look like there is anything wrong.

What I also noticed when I refresh the snap from the snap store it can't connect to our server but when I install the app manually it works fine. Could that be a problem with the network manager not restarting properly after updating? Could that be resolved via the refresh hooks?

KirioXX commented 5 months ago

Ok I resolve the network issue by adding back:

- network
      - network-observe
      - network-bind
      - network-status
      - network-control
      - network-manager
      - network-manager-observe
      - netlink-audit
      - netlink-connector
      - qualcomm-ipc-router
      - network-setup-observe
      - bluetooth-control
      - avahi-observe

To the plugs.

There is something wrong when it tries to install the app, when I do snap refresh I pretty consistent get the error.

Saviq commented 5 months ago

Hey, I got a bit lost here… I'm going to assume you solved the network / network-manager issue, that was something else altogether.


The log you provided indeed shows that at least inotifywait was found (that's, if you kept it in the wayland-launch script).

There is something wrong when it tries to install the app, when I do snap refresh I pretty consistent get the error.

That's the inotifywait error? Would be good to see the full output from wayland-launch with set-x and env added at the top - both when it's happy, and with the error - this could point at what's different.

Saviq commented 5 months ago

@KirioXX one thing I noticed in the snapcraft.yaml you gave - it has > for all the multiline strings, rather than |. That means they get squashed into a single line and things will definitely not work as intended.

I couldn't even get the snap to build until I replaced them back. I've inserted flutterdemo from the example and the snap worked just fine locally. I'll build an arm64 version and try on a Pi, next.

KirioXX commented 5 months ago

Thanks @Saviq, you are a legend!

About the network issue, this seams to be resolved after I added back the network plugs. I removed them to test if they are actually needed and it looks like they are.

The annoying part with the > to | is that this is caused by release-please that we use to auto generate our releases. I don't know why but it reformats the entire yaml file even though it should only update the version number.

Saviq commented 5 months ago

@KirioXX so that solves your troubles? Or at least makes it clear what needs fixing? Please reopen this issue or open another one if you find there's still something wrong our side.

I'd say that this is a major issue with release-please if it doesn't retain newlines in multiline strings…

KirioXX commented 5 months ago

Thank you very much for the help @Saviq 🙌

I found a way to remove the release-please formatting and test now if that will resolve the refresh issues. So far it looks promising, I will let you know if I encounter another issue.