gmodena / nix-flatpak

Install flatpaks declaratively
Apache License 2.0
320 stars 10 forks source link

Bug | `flatpak-managed-install.service` Fails to Start When Not Connected to the Internet #45

Open ReedClanton opened 8 months ago

ReedClanton commented 8 months ago

Description

When a host NixOS machine rebuilds a system that includes nix-flatpak while not connected to the internet, flatpak-manged-install.service fails to start with the error message provided bellow.

reloading user units for reedclanton...
setting up tmpfiles
restarting the following units: wpa_supplicant-wlp4s6.service
warning: the following units failed: flatpak-managed-install.service

× flatpak-managed-install.service
     Loaded: loaded (/etc/systemd/system/flatpak-managed-install.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Fri 2024-02-16 17:37:06 CST; 299ms ago
TriggeredBy: ● flatpak-managed-install.timer
    Process: 37129 ExecStart=/nix/store/9ra2bx3n45rznggnmjwr8dl55w86dali-flatpak-managed-install (code=exited, status=1/FAILURE)
   Main PID: 37129 (code=exited, status=1/FAILURE)
         IP: 0B in, 0B out
        CPU: 28ms

Feb 16 17:37:06 nixos-desktop-gnome systemd[1]: Starting flatpak-managed-install.service...
Feb 16 17:37:06 nixos-desktop-gnome 9ra2bx3n45rznggnmjwr8dl55w86dali-flatpak-managed-install[37132]: error: Can't load uri https://flathub.org/beta-repo/flathub-beta.flatpakrepo: While fetching https://flathub.org/beta-repo/flathub-beta.flatpakrepo: [6] Couldn't resolve host name
Feb 16 17:37:06 nixos-desktop-gnome systemd[1]: flatpak-managed-install.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 17:37:06 nixos-desktop-gnome systemd[1]: flatpak-managed-install.service: Failed with result 'exit-code'.
Feb 16 17:37:06 nixos-desktop-gnome systemd[1]: Failed to start flatpak-managed-install.service.
warning: error(s) occurred while switching to the new configuration

Addition Information

This error occurs on the latest version of main as well as on commit 6079344, 6622918, and presumably most/all others. This is worth pointing out because it means it wasn't caused/solved by #30.

Once this issue is encountered, the user may reboot without issue. In other words, it doesn't cause any failures during boot.

I tested this very thing here and didn't see this issue. This could be caused by:

Environment

This issue occurs on a laptop and desktop. The configuration uses flakes and installs a single flatpak via the nix-flatpak module and many flatpaks via the Home Manager module.

gmodena commented 8 months ago

When a host NixOS machine rebuilds a system that includes nix-flatpak while not connected to the internet, flatpak-manged-install.service fails to start with the error message provided bellow.

That is expected behavior if you build a system with services.flatpak.update.onActivation = true, and you have no connectivity.

Once this issue is encountered, the user may reboot without issue. In other words, it doesn't cause any failures during boot.

There's a chance that upon reboot, you network connection went back up. The flatpak-manged-install.service unit is started only after multi-user.target target is reached (meaning, that network & connectivity are expected to have started).

mrnetlex commented 8 months ago

The same happens to me - flatpak-managed-install.service fails with same way. I have nix-flatpak.url = "github:gmodena/nix-flatpak"; in my flake.nix, so presumably I use latest version. services.flatpak.update.onActivation isn't specified, so it should default to false.

(I don't know if it could be related, but after every reboot I get alert from nextcloud-client that says it couldn't connect. I would assume that it should start way after system got network connection, so maybe there's some common cause way services try to connect to early, but this whole reasoning is probably too far-fetched.)

cig0 commented 6 months ago

Hi,

I'm experiencing the same issue.

When wired to my router, the service behaves as expected; however, it will fail when relying on the WiFi connection (I'm using the NetworkManager service here).

I tried all sorts of combinations to make the service start correctly at boot time (by editing the nixos.nix file), i.e.:

In all cases, flatpak-managed-install.service will fail at start-up. After scratching my head for a reasonable amount of time, my hunch is that the issue has to be with how NixOS handles networking (I am new to NixOS; I've been around less than two weeks, so I can't affirm anything!).

In my case, something that would help me avoid having my system tainted (as shown with # systemctl list-machines) right off the bat on a fresh boot would be to disable the service from automatically starting at boot time and let the timer trigger it. I looked at the code but couldn't find a way to have the service created with a disabled state.

My configuration is as follows:

/etc/nixos/modules/flatpak.nix

{
  services.flatpak ={ 
    enable = true;
    update = {
      auto = {
        enable = true;
        onCalendar = "weekly"; # Default value
      };
      onActivation = false;
    };

    uninstallUnmanaged = true;
    packages = [
    ...
    ...
    ];
  };
}

/etc/nixos/configuration.nix

{
  imports =
    [
      ...
      # Flatpak
      ./modules/flatpak.nix
      ./modules/nix-flatpak/modules/nixos.nix
      ...
    ];
}

On a side note, I love this development; it makes using NixOS even more enjoyable. Thank you! :raised_hands: I'd be more than glad to chip in some money if you set up a sponsor link :)

cig0 commented 6 months ago

Systemd's documentation on network configuration: https://systemd.io/NETWORK_ONLINE/

gmodena commented 6 months ago

Hey @mrnetlex @cig0

I have been trying to replicate this issue, but so far no luck. Is there chance you could share you network config? In my experience these things tend to be a bit flaky.

FWIW: my baseline env can be found under testing-base; with that setup, I was not able to replicate the issue. I never noticed any issue switching to/from wired/wifi connections on my daily driver either, but I don't reboot that often.

@mrnetlex you are right - if services.flatpak.update.onActivation is not specified, it should default to false, and not try to download flatpaks at boot. This is what I would expect the unit status to look like with default settings:

[antani@nixos:~]$ systemctl --user status flatpak-managed-install
○ flatpak-managed-install.service
     Loaded: loaded (/home/antani/.config/systemd/user/flatpak-managed-install.servic>
     Active: inactive (dead) since Wed 2024-05-08 18:40:03 UTC; 329ms ago
    Process: 1524 ExecStart=/nix/store/930iss74cxw2ailj31bjjfkhi6dvhmi7-flatpak-manag>
   Main PID: 1524 (code=exited, status=0/SUCCESS)
        CPU: 288ms

May 08 18:40:01 nixos systemd[1514]: Starting flatpak-managed-install.service...
May 08 18:40:03 nixos systemd[1514]: Finished flatpak-managed-install.service.

Do you (still) experience a different behavior?

@cig0

When wired to my router, the service behaves as expected; however, it will fail when relying on the WiFi connection (I'm using the NetworkManager service here).

I tried all sorts of combinations to make the service start correctly at boot time (by editing the nixos.nix file), i.e.:

* Adding a `Wants=network-online.target` statement, as explained in `man 7 systemd.special`

* `Wants=` and `Requires=` with `network.target`, `network-online.target`, and `NetworkManager.service`

The systemd unit that nix-flatpak installs (flatpak-managed-install should start after systemd's multi-user.target, and is wanted by default.target. My understanding (that could be wrong) is that the unit would not try to kick off a downlad till GUI and network are up and running.

Could you maybe paste me the output systemctl status flatpak-managed-install (--user if you install it as a home-manager module) after startup ? Does journalctl report any useful info?

In my case, something that would help me avoid having my system tainted (as shown with # systemctl list-machines) right off the bat on a fresh boot would be to disable the service from automatically starting at boot time and let the timer trigger it. I looked at the code but couldn't find a way to have the service created with a disabled state.

Ah! I wonder if setting services.flatpak.update.auto.enabled=true is triggering the download attempt at boot (thus overriding services.flatpak.update.onActivation=false). This could happen: If a timer had expired while a machine was off/asleep, it will fire upon resume. See https://wiki.archlinux.org/title/systemd/Timers for details.

Just realized that OP also reports a TriggeredBy: ● flatpak-managed-install.timer in the error message.

FWIW services.flatpak.update.auto.enabled is kinda documented (in the module's options docs), but in hindsight it might be a bit counterintuitive/unclear. Tbh I need to triple check this code path (it has been a while ); I'll f/up in thread.

On a side note, I love this development; it makes using NixOS even more enjoyable. Thank you! 🙌 I'd be more than glad to chip in some money if you set up a sponsor link :)

Thanks for the kind words! Happy to hear you find this project useful. I appreciate the offer to sponsor, but right now I don't have a significant amount of resources invested in this project. Any help in the form of bug reports (like this one!), feature discussions and doc improvements is very much welcome & appreciated :)

cig0 commented 6 months ago

@gmodena I'm afraid there's not much information in the service logs, only the mention it can't fetch the remote object:

~ λ journalctl -b -u flatpak-managed-install.service
May 08 17:46:08 perrrkele systemd[1]: Starting flatpak-managed-install.service...
May 08 17:46:08 perrrkele vyb3rjlabp427icswxan9fhz3dpxqgwm-flatpak-managed-install[1313]: error: Can't load uri https://dl.flathub.org/repo/flathub.flatpakrepo: While fetching https://dl.flathub.org/repo/flathub.flatpakrepo: [6] Couldn't resolve host name
May 08 17:46:08 perrrkele systemd[1]: flatpak-managed-install.service: Main process exited, code=exited, status=1/FAILURE
May 08 17:46:08 perrrkele systemd[1]: flatpak-managed-install.service: Failed with result 'exit-code'.
May 08 17:46:08 perrrkele systemd[1]: Failed to start flatpak-managed-install.service.

The endpoint is perfectly reachable otherwise:

~ λ curl -I https://dl.flathub.org/repo/flathub.flatpakrepo
HTTP/2 200 
server: nginx/1.18.0 (Ubuntu)
content-type: application/octet-stream
last-modified: Fri, 12 Jan 2018 12:24:05 GMT
etag: "5a58a8e5-fc8"
expires: Thu, 15 Feb 2024 12:30:10 GMT
cache-control: max-age=3600, public
backend-name: 3DxooTFj8SlVTdJ0UTX8Jd--F_front_hex2
via: 1.1 varnish, 1.1 varnish
accept-ranges: bytes
date: Wed, 08 May 2024 20:58:12 GMT
age: 1707
x-served-by: cache-lhr7381-LHR, cache-gru-sbgr1930057-GRU
x-cache: HIT, HIT
x-cache-hits: 52515, 1
x-timer: S1715201893.865167,VS0,VE2
strict-transport-security: max-age=31557600
alt-svc: h3=":443";ma=86400,h3-29=":443";ma=86400,h3-27=":443";ma=86400
content-length: 4040
cig0 commented 6 months ago

By the way, this is my DNS resolver configuration in case you may be wondering if the issue could be related to it:

/etc/nixos/configuration.nix

  # Enable networking
  networking = {
    hostName = "perrrkele";
    # networking.wireless.enable = true;  # Enables wireless support via wpa_supplicant.
    networkmanager = {
      enable = true;
      dns = "systemd-resolved";
    };
  };

(Module)

{
  services.resolved = {
    enable = true;
    fallbackDns = [
      "82.96.65.2" "94.140.14.14" "1.1.1.1"
    ];
  };
}
gmodena commented 6 months ago

@cig0 ack - thanks for info.

Just to be sure; are restarts after book (systemctl restart flatpak-managed-install ) successful?

A workaround I can think if of would be testing if domains can be resolved in the installer script, and retrying if not. But I am not super fond of introducing a busy wait at boot (or forcing a success for a failing service).

cig0 commented 6 months ago

@gmodena Manually restarting the service works as expected -- take a look at this beautiful output: Screenshot_20240508_190715

Yeah, I'm not fond neither of introducing dirty workarounds or obfuscating a system's behavior if it's not absolutely necessary, which is not the case IMO.

I will continue digging here. I want to understand why, if the service behaves correctly on your end, it is failing on my side, especially considering this is a fairly fresh NixOS installation—it isn't even two weeks old.

I'll get back to you on Discourse once I have a first draft ready :+1:

cig0 commented 6 months ago

By the way, what kind of networking setup do you have, @gmodena? Are you also using NetworkManager?

gmodena commented 6 months ago

By the way, what kind of networking setup do you have, @gmodena? Are you also using NetworkManager?

I am also using NetworkManager (networking.networkmanager.enable = true;).

cig0 commented 6 months ago

By the way, what kind of networking setup do you have, @gmodena? Are you also using NetworkManager?

I am also using NetworkManager (networking.networkmanager.enable = true;).

This makes this issue even more interesting! Given Nix(OS) very nature, I wonder what settings are introducing noise for OP and me, making the service fail :thinking:

cig0 commented 6 months ago

@ReedClanton @gmodena @mrnetlex I'm happy to inform you that I've found the root cause of the issue, which can be solved with a tiny change: https://github.com/gmodena/nix-flatpak/pull/67

io12 commented 5 months ago

This command waits 60 seconds for an internet connection.

''
  ${pkgs.networkmanager}/bin/nm-online --quiet --timeout 60
''

https://forum.manjaro.org/t/for-those-who-use-systemd-services-that-rely-on-a-network-connection/83626/1

gmodena commented 5 months ago

Thanks for the pointer @io12, that thread contained a lot of useful info.

The command your shared would work for NetworkManager users, but I would not want to enforce a dep on NetworkManager on every system. FWIW NetworkManager ships with a service to address the problem discussed in this issue: https://man.archlinux.org/man/NetworkManager-wait-online.service.8.en (under the hood it runs: nm-online -s -q ).

I was hoping that an explicit Wants=on network-online.target would help, but there is no guarantee of what online means.

From systemd's doc [...] Units that strictly require a configured network connection should pull in network-online.target (via a Wants= type dependency) and order themselves after it. This target unit is intended to pull in a service that delays further execution until the network is sufficiently set up. What precisely this requires is left to the implementation of the network managing service. [...].

gmodena commented 5 months ago

@ReedClanton @gmodena @mrnetlex I'm happy to inform you that I've found the root cause of the issue, which can be solved with a tiny change: #67

Hey @cig0, I did not have a change to f/up on the PR before you closed it. Sorry about that.

Don't know if you already came across this, but there's no need to modify upstream to alter a systemd unit. You should be able to add a sleep to the flatpak-managed-install service by adding something like this to your config (not tested):

  systemd.services."flatpak-managed-install" = {
    serviceConfig = {
      ExecStartPre = "${pkgs.coreutils}/bin/sleep 5";
    };
  };

Hope this helps.

cig0 commented 5 months ago

@ReedClanton @gmodena @mrnetlex I'm happy to inform you that I've found the root cause of the issue, which can be solved with a tiny change: #67

Hey @cig0, I did not have a change to f/up on the PR before you closed it. Sorry about that.

Don't know if you already came across this, but there's no need to modify upstream to alter a systemd unit. You should be able to add a sleep to the flatpak-managed-install service by adding something like this to your config (not tested):

  systemd.services."flatpak-managed-install" = {
    serviceConfig = {
      ExecStartPre = "${pkgs.coreutils}/bin/sleep 5";
    };
  };

Hope this helps.

This is pretty cool! Yesterday, I was thinking about a similar approach using an overlay (I started learning about them), but your solution is much simpler. K.I.S.S. FTW :rocket:

dezren39 commented 5 days ago

I actually am in offline mode but have encountered this on activation. I had auto-update and onActivation enabled, but I also encountered this when rebuild with just the flake module imported. I would like if the apps list doesn't add any new programs that the service would exit 0 or something.

(I don't use networkmanager, i use wpa_supplicant by way of networking.wireless)

gmodena commented 4 days ago

Hey @dezren39, setting update.onActivation=true assumes that connectivity is available during activation. Otherwise, the module should support the offline mode you described in #92.

but I also encountered this when rebuild with just the flake module imported

Ack. This sounds like unwanted / buggy behavior. I need to verify that I did not introduce a regression.

Would you mind sharing the following information?

Thanks!

gmodena commented 4 days ago

FWIW, I just tried an offline build (I switched off networking) with nix-flatpak installed as a home-manager module, and services.flatpak.update.onActivation=false (default value).

The system built, and this is the status of the systemd unit post activation:

○ flatpak-managed-install.service
     Loaded: loaded (/home/gmodena/.config/systemd/user/flatpak-managed-install.service; enabled; preset: enabled)
     Active: inactive (dead) since Sun 2024-11-03 19:40:13 CET; 3h 24min ago
   Main PID: 3198 (code=exited, status=0/SUCCESS)
        CPU: 4.872s

nov 03 19:40:11 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3285]: Skipping: com.logseq.Logseq/x86_64/stable is already installed
nov 03 19:40:11 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3291]: Skipping: com.jetbrains.IntelliJ-IDEA-Community/x86_64/stable is already insta>
nov 03 19:40:11 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3297]: Skipping: com.jetbrains.PyCharm-Community/x86_64/stable is already installed
nov 03 19:40:12 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3326]: Skipping: org.signal.Signal/x86_64/stable is already installed
nov 03 19:40:12 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3361]: Skipping: io.typora.Typora/x86_64/stable is already installed
nov 03 19:40:12 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3457]: Skipping: net.ankiweb.Anki/x86_64/stable is already installed
nov 03 19:40:13 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3489]: Skipping: com.visualstudio.code/x86_64/stable is already installed
nov 03 19:40:13 framework-nixos-1 1bszkfgay14f34map1xvfrd85y3kw8aq-flatpak-managed-install[3519]: Skipping: io.github.zen_browser.zen/x86_64/stable is already installed
nov 03 19:40:13 framework-nixos-1 systemd[3185]: Finished flatpak-managed-install.service.
nov 03 19:40:13 framework-nixos-1 systemd[3185]: flatpak-managed-install.service: Consumed 4.872s CPU time.

The timestamps are consistent with timer execution schedule.

Did I understand it correctly that you had offline activations fail with services.flatpak.update.onActivation=false? If that's the case, I wonder if it could be a side effect of how systemd timers are persisted. Could you try explicitly setting services.flatpak.update.auto=false between activations?

In the meanwhile, I'll try to repro with nix-flatpak installed as a nixos module too.