fort-nix / nix-bitcoin

A collection of Nix packages and NixOS modules for easily installing full-featured Bitcoin nodes with an emphasis on security.
https://nixbitcoin.org
MIT License
513 stars 107 forks source link

Too many active HTLC, lost most of channels #CLN24.05 #bitcoind27.1 #RPi4 #715

Open eltoo-cln opened 4 months ago

eltoo-cln commented 4 months ago

Hello I am trying to find what went wrong with my node and why I lost recently most of my channels. This situation happen shortly after upgrading to 1.0.9 (CLN 24.05)

My HW setup:

This node has been up for about 2 years without any previous issues. NB is deployed via krops (I still didn't move to flakes)

CLN:

lightning-cli getinfo
{
   "id": "reducted",
   "alias": "reducted",
   "color": "reducted",
   "num_peers": 37,
   "num_pending_channels": 0,
   "num_active_channels": 5,
   "num_inactive_channels": 5,
   "address": [
      {
         "type": "torv3",
         "address": "reductedd.onion",
         "port": 9735
      }
   ],
   "binding": [
      {
         "type": "ipv4",
         "address": "reducted",
         "port": 9735
      }
   ],
   "version": "v24.05",
   "blockheight": 851703,
   "network": "bitcoin",
   "fees_collected_msat": reducted,
   "lightning-dir": "/var/lib/clightning/bitcoin",
   "our_features": {
      "init": "08a0800a8a59a1",
      "node": "88a0800a8a59a1",
      "channel": "",
      "invoice": "02000002024100"
   }
}

configuration.nix

{ config, pkgs, lib, ... }: {
  imports = [
    <nix-bitcoin/modules/presets/secure-node.nix>
    <nix-bitcoin/modules/presets/hardened.nix>
  ./hardware-configuration.nix

  ];
  nix-bitcoin.onionServices.bitcoind.public = true;

  services.clightning = {
    enable = true;

    plugins = {
      monitor.enable = true;
      rebalance.enable = true;
      currencyrate.enable = true;
      clboss = {       
        enable = true;
        zerobasefee = "require";
        min-onchain = 500000000;
        min-channel = 5000000; 
      };                       
    };

    extraConfig = ''
      alias=reducted
      rgb=reducted
      min-capacity-sat=8000000
      htlc-minimum-msat=1000000
      large-channels
      fee-base=0
      experimental-offers
      experimental-onion-messages
    '';
  };
  nix-bitcoin.onionServices.clightning.public = true;
  systemd.services.clightning.serviceConfig.TimeoutStartSec = "5m";

  services.rtl = {
    enable = true;
  };
  services.rtl.nodes.clightning = {
    enable = true;
    extraConfig = {
      Settings = {
        themeMode = "night";
        enableOffers = "true";
      };
    };
  };
  services.electrs.enable = true;

  services.backups.enable = true;
  programs.ssh.knownHosts."host" = {
    hostNames = [ "recucted" ];
    publicKey = "recucted";
  };

  services.borgbackup.jobs."reducted" = {
    paths = [
      "/var/lib/localBackups"
    ];
    repo = "reducted";
    encryption = {
      mode = "reducted";
      passCommand = "reducted";
    };
    environment.BORG_RSH = "reducted";
    compression = "auto,zstd";
    startAt = "daily";
  };

  nix-bitcoin.netns-isolation.enable = true;

  networking.hostName = "eltoo-cln";
  time.timeZone = "reducted";

  services.openssh = {
    enable = true;
    settings = {
      LoginGraceTime = 0; # RCE fix
    };
  };
  users.users.root = {
    openssh.authorizedKeys.keys = [
      "reducted"
    ];
  };

  environment.systemPackages = with pkgs; [
    reducted
  ];

  boot = {
    kernelPackages = pkgs.linuxPackages_rpi4;
    tmp.useTmpfs = true;
    kernelParams = [
      "8250.nr_uarts=1"
      "console=ttyAMA0,115200"
      "console=tty1"
      "cma=128M"
    ];
  };

  boot.loader.grub.enable = false;

  hardware.enableRedistributableFirmware = true;
  nixpkgs.config = {
    allowUnfree = true;
  };

  networking = {
    dhcpcd.enable = false;
    interfaces.eth0.ipv4.addresses = [{
      address = "reducted";
      prefixLength = 32;
    }];
    defaultGateway = {
      address = "reducted";
      interface = "eth0";
    };
    nameservers = [ "reducted" ];
  };

  i18n.defaultLocale = "reducted";
  console = {
    font = "Hack";
    keyMap = "us";
  };

  nix = {
    settings = {
      auto-optimise-store = true;
    };
    gc = {
      automatic = true;
      dates = "weekly";
      options = "--delete-older-than 14d";
    };
    extraOptions = ''
      min-free = ${toString (100 * 1024 * 1024)}
      max-free = ${toString (1024 * 1024 * 1024)}
    '';
  };

  system.stateVersion = "24.05"; 

  nix-bitcoin.configVersion = "0.0.109";
}

I would like to ask for support. I didn't find anything suspicious in logs, but I'm also not sure what I should be looking for.

After reviewing clightning.service log, there were coupe of strange lines:

lightningd[4601]: INFO    plugin-clboss: New block at 851508
lightningd[4670]: Unhandled exception in concurrent task!
lightningd[4670]: Incorrect type.
lightningd[4670]: ...main loop continuing...
lightningd[4601]: **BROKEN** plugin-clboss: NodeBalanceSwapper: Unexpected 
lightningd[4601]: **BROKEN** plugin-clboss: EarningsRebalancer: Unexpected 
lightningd[4601]: INFO    plugin-clboss: InternetConnectionMonitor: online.
lightningd[4601]: INFO    plugin-clboss: Connector: failed to connect to 02
lightningd[4601]: INFO    plugin-clboss: Connector: failed to connect to 02
lightningd[4601]: UNUSUAL plugin-clboss: NeedsConnectSolicitor: Ran out of 
lightningd[4601]: INFO    plugin-clboss: InternetConnectionMonitor: online.
lightningd[4670]: Unhandled exception in concurrent task!
lightningd[4670]: Incorrect type.
lightningd[4670]: ...main loop continuing...
lightningd[4601]: **BROKEN** plugin-clboss: NodeBalanceSwapper: Unexpected 
lightningd[4601]: **BROKEN** plugin-clboss: EarningsRebalancer: Unexpected 
lightningd[4601]: INFO    plugin-clboss: InternetConnectionMonitor: online.
lightningd[4601]: INFO    plugin-clboss: Connector: connected to 0231eccc65
lightningd[4601]: INFO    plugin-clboss: NeedsConnectSolicitor: Connection 
lightningd[4601]: INFO    plugin-clboss: Connector: failed to connect to 02
lightningd[4670]: Unhandled exception in concurrent task!
lightningd[4670]: Incorrect type.
lightningd[4670]: ...main loop continuing...
lightningd[4601]: **BROKEN** plugin-clboss: NodeBalanceSwapper: Unexpected 
lightningd[4601]: **BROKEN** plugin-clboss: EarningsRebalancer: Unexpected 

BROKEN lines related to plugin-clboss was already reported here

But I am not sure what is this repetitive part related to:

lightningd[4670]: Unhandled exception in concurrent task!
lightningd[4670]: Incorrect type.
lightningd[4670]: ...main loop continuing...

The other thing which may or may not be related is FS status: /var/lib has less than 10%

df -h
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        391M     0  391M   0% /dev
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           2.0G  3.4M  2.0G   1% /run
tmpfs           3.9G  2.2M  3.9G   1% /run/wrappers
/dev/dm-0        20G   12G  7.5G  60% /
/dev/dm-1       859G  758G   58G  93% /var/lib
tmpfs           3.9G     0  3.9G   0% /tmp
/dev/sda1      1022M  104M  919M  11% /boot
tmpfs           781M  4.0K  781M   1% /run/user/0
tmpfs           781M  4.0K  781M   1% /run/user/1001

I would like to find out what the problem is or report any issue, but first need to narrow down to the root cause. Is anybody able to guide me on this journey, please?

jonasnick commented 4 months ago

Hi @eltoo-cln !

What does "Too many active HTLC" mean? This seems like an issue with core lightning or clboss.

eltoo-cln commented 4 months ago

Hi @eltoo-cln !

What does "Too many active HTLC" mean? This seems like an issue with core lightning or clboss.

Hi @jonasnick and thank you for your time to read my issue.

Two days ago I could see in RTL over 130 active HTLCs I did have few before, but not more than 10 IIRC.

Currently there are 56 with three different nodes all of which are already Disconnected and in state: Onchain

dazl commented 4 months ago

you can set the max number of concurrent HTLC (per channel I believe), something like

max-concurrent-htlcs=2

in services.clightning.extraConfig within configuration.nix

urza commented 4 months ago

Hi, I think it would be good to get to the bottom of this to understand what happened here, so others can avoid it.

My node had a channel with @eltoo-cln 's node. Their node lost most of the channels in range of few days because their peers automatically force closed channels with their node.

My node was one of the many that closed channel with their node.

Here are channel updates of our channel as seen from my node:

"state_changes": [
            {
               "timestamp": "2022-07-08T21:23:57.561Z",
               "old_state": "CHANNELD_AWAITING_LOCKIN",
               "new_state": "CHANNELD_NORMAL",
               "cause": "user",
               "message": "Lockin complete"
            },
            {
               "timestamp": "2024-07-07T18:32:26.570Z",
               "old_state": "CHANNELD_NORMAL",
               "new_state": "AWAITING_UNILATERAL",
               "cause": "protocol",
               **"message": "Offered HTLC 681 SENT_ADD_ACK_REVOCATION cltv 851131 hit deadline"**
            },
            {
               "timestamp": "2024-07-07T18:43:26.772Z",
               "old_state": "AWAITING_UNILATERAL",
               "new_state": "FUNDING_SPEND_SEEN",
               "cause": "onchain",
               "message": "Onchain funding spend"
            },
            {
               "timestamp": "2024-07-07T18:43:27.151Z",
               "old_state": "FUNDING_SPEND_SEEN",
               "new_state": "ONCHAIN",
               "cause": "onchain",
               "message": "Onchain init reply"
            }
         ],

This is beyond my understanding of the inner workings of LN, but the important part seems to be:

"message": "Offered HTLC 681 SENT_ADD_ACK_REVOCATION cltv 851131 hit deadline"

Is it possible that @eltoo-cln 's node was in a "stuck" state without ability to write to db or something like that and could not cooperate on moving HTLCs to proper state and that is why all the peers force closed the channels?

northben commented 4 months ago

could be caused by clboss: https://github.com/ElementsProject/lightning/issues/7172#issuecomment-2221040042