NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.16k stars 14.19k forks source link

foundationdb coredumps on 24.05 calling into `cxx1112regex_traits` #319537

Open siriobalmelli opened 5 months ago

siriobalmelli commented 5 months ago

Describe the bug

foundationdb dumps core in the following system configurations:

foundationdb architecture kernel nixpkgs status
7.1.32 aarch64 6.1.92 release-24.05 coredump
7.1.32 aarch64 6.1.92 release-23.11 ok
7.1.32 aarch64 6.6.32 release-24.05 coredump
7.1.32 aarch64 6.6.32 release-23.11 ok
7.1.30 aarch64 - - build fail
7.1.32 x86_64 6.1.92 release-24.05 coredump
7.1.32 x86_64 6.1.92 release-23.11 ok
7.1.32 x86_64 6.6.32 release-24.05 coredump
7.1.32 x86_64 6.6.32 release-23.11 ok
7.1.30 x86_64 6.1.92 release-24.05 coredump
7.1.30 x86_64 6.1.92 release-23.11 ok
7.1.30 x86_64 6.6.32 release-24.05 coredump
7.1.30 x86_64 6.6.32 release-23.11 ok

Steps To Reproduce

Set up a single machine test cluster using a minimal flake:

{
  description = "foundationdb crash reproduction";

  inputs = {
    nixpkgs-24_05.url = "github:nixos/nixpkgs/release-24.05";
    nixpkgs-23_11.url = "github:nixos/nixpkgs/release-23.11";
  };

  outputs = {self, ...} @ inputs: let
    inherit (inputs.nixpkgs-24_05.lib) nixosSystem; # toggle nixpkgs here
  in {
    nixosConfigurations.test-system = nixosSystem {
      system = "x86_64-linux"; # toggle architecture here
      modules = [
        ({
          modulesPath,
          pkgs,
          ...
        }: {
          imports = [
            "${modulesPath}/virtualisation/amazon-image.nix"
          ];

          # boot.kernelPackages = pkgs.linuxPackages_6_1;
          boot.kernelPackages = pkgs.linuxPackages_6_6; # toggle kernel here

          ec2.hvm = true;

          networking.useDHCP = true;

          services.foundationdb = {
            enable = true;

            extraReadWritePaths = ["/run/foundationdb"];
            listenAddress = "127.0.0.1:4500";
            listenPortStart = 4500;
            openFirewall = true;
            package = pkgs.foundationdb71;
            pidfile = "/run/foundationdb/fdb.pid";
            publicAddress = "127.0.0.1";
            restartDelay = 120;
            serverProcesses = 1;
            traceFormat = "json";
          };

          system.stateVersion = "24.05";
        })
      ];
    };
  };
}

See comments above for where to toggle nixpkgs, architecture, kernel; changing foundationdb version is outside the scope of this simple reproduction but suffice it to say I've tested that also.

Resulting coredump can be seen with:

coredumpctl list | grep fdbserver | tail -n 1 | awk '{ print $5 }' | xargs coredumpctl info

Example:

           PID: 1320 (fdbserver)
           UID: 118 (foundationdb)
           GID: 118 (foundationdb)
        Signal: 11 (SEGV)
     Timestamp: Thu 2024-06-13 09:22:18 UTC (17min ago)
  Command Line: /nix/store/cz1i01ckbvrxn1gli0bbrim16dvznqv7-foundationdb-7.1.32/bin/fdbserver --cluster_file /etc/foundationdb/fdb.cluster --datadir /var/lib/foundationdb/4500 --listen_address 127.0.0.1:4500 --logdir /var/log/foundationdb --logsize 10MiB --maxlogssize 100MiB --memory 8GiB --public_address 127.0.0.1:4500 --storage_memory 1GiB --trace_format json
    Executable: /nix/store/cz1i01ckbvrxn1gli0bbrim16dvznqv7-foundationdb-7.1.32/bin/fdbserver
 Control Group: /system.slice/foundationdb.service
          Unit: foundationdb.service
         Slice: system.slice
       Boot ID: 4b0c405bd88a4031b58c8dceb9be882e
    Machine ID: ec26ef85d6581da22538098e8836259e
      Hostname: ip-172-29-141-193.eu-west-1.compute.internal
       Storage: /var/lib/systemd/coredump/core.fdbserver.118.4b0c405bd88a4031b58c8dceb9be882e.1320.1718270538000000.zst (present)
  Size on Disk: 558.0K
       Message: Process 1320 (fdbserver) of user 118 dumped core.

                Module libgcc_s.so.1 without build-id.
                Module libstdc++.so.6 without build-id.
                Module libboost_context.so.1.78.0 without build-id.
                Stack trace of thread 1320:
                #0  0x0000000002a25854 _ZNKSt7codecvtIDic11__mbstate_tE10do_unshiftERS0_PcS3_RS3_ (fdbserver + 0x2625854)
                #1  0x0000000001d88710 _ZNSt8__detail15_BracketMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0EE8_M_readyEv (fdbserver + 0x1988710)
                #2  0x0000000001d88aac _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE25_M_insert_bracket_matcherILb0ELb0EEEvb (fdbserver + 0x1988aac)
                #3  0x0000000001d9a60d _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE7_M_atomEv (fdbserver + 0x199a60d)
                #4  0x0000000001d99083 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbserver + 0x1999083)
                #5  0x0000000001d9965b _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_disjunctionEv (fdbserver + 0x199965b)
                #6  0x0000000001d9a443 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE7_M_atomEv (fdbserver + 0x199a443)
                #7  0x0000000001d99083 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbserver + 0x1999083)
                #8  0x0000000001d99161 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbserver + 0x1999161)
                #9  0x0000000001d9965b _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_disjunctionEv (fdbserver + 0x199965b)
                #10 0x00000000023dc723 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEEC2EPKcS6_RKSt6localeNSt15regex_constants18syntax_option_typeE.constprop.0 (fdbserver + 0x1fdc723)
                #11 0x0000000001d8dbad _ZN8Hostname10isHostnameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (fdbserver + 0x198dbad)
                #12 0x0000000001da2334 _ZN23ClusterConnectionStringC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (fdbserver + 0x19a2334)
                #13 0x0000000001ca267c _ZN21ClusterConnectionFileC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (fdbserver + 0x18a267c)
                #14 0x000000000139faaa _ZN12_GLOBAL__N_110CLIOptions17parseArgsInternalEiPPc (fdbserver + 0xf9faaa)
                #15 0x0000000000e001ca main (fdbserver + 0xa001ca)
                #16 0x00007fbf4e75a10e __libc_start_call_main (libc.so.6 + 0x2a10e)
                #17 0x00007fbf4e75a1c9 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a1c9)
                #18 0x0000000000e520d5 _start (fdbserver + 0xa520d5)
                ELF object binary architecture: AMD x86-64

Expected behavior

A running single-node foundationdb cluster, check with sudo fdbcli --exec status:

Broken System

SIGNAL: Segmentation fault (11)
Trace: addr2line -e fdbcli.debug -p -C -f -i 0x7335ac 0x728a3d 0x72aa33 0x72ab11 0x72b00b 0xc02473 0x84810b 0x6214ea 0x7ff65d43d10e
Segmentation fault

Working System

Using cluster file `/etc/foundationdb/fdb.cluster'.

Configuration:
  Redundancy mode        - single
  Storage engine         - ssd-2
  Coordinators           - 1
  Usable Regions         - 1

Cluster:
  FoundationDB processes - 1
  Zones                  - 1
  Machines               - 1
  Memory availability    - 7.5 GB per process on machine with least available
  Fault Tolerance        - 0 machines
  Server time            - 06/13/24 10:14:01

Data:
  Replication health     - (Re)initializing automatic data distribution
  Moving data            - unknown (initializing)
  Sum of key-value sizes - unknown
  Disk space used        - 210 MB

Operating space:
  Storage server         - 3.1 GB free on most full server
  Log server             - 3.1 GB free on most full server

Workload:
  Read rate              - 16 Hz
  Write rate             - 0 Hz
  Transactions started   - 4 Hz
  Transactions committed - 0 Hz
  Conflict rate          - 0 Hz

Backup and DR:
  Running backups        - 0
  Running DRs            - 0

Client time: 06/13/24 10:14:01

Additional context

Looking at the dependency tree with:

nix-tree .#nixosConfigurations.test-system.config.services.foundationdb.package

The issue appears to be the glibc version change 2.38-77 -> glibc-2.39-52, which is both a direct dependency of foundationdb and an indirect dependency via boost-1.78.0, it was not obvious how to test this further.

I am happy to collect additional data as needed.

Notify maintainers

  1. foundationdb maintainers:

    @thoughtpolice @lostnet

  2. glibc maintainers:

    @eelco @ma27 @connorbaker

Metadata

Broken System

Working System

siriobalmelli commented 5 months ago

Bump.

If there's anything else I can do to better debug please let me know.

lostnet commented 4 months ago

It looks to me like the implementation of ClusterConnectionString was replaced in newer versions so not encountering this would probably be a benefit of updating the version, so that may be an option. (But I am not able to participate in that process.)

siriobalmelli commented 4 months ago

It looks to me like the implementation of ClusterConnectionString was replaced in newer versions so not encountering this would probably be a benefit of updating the version, so that may be an option. (But I am not able to participate in that process.)

@lostnet thank you for your input. Could you tag who you think might be the right person for this? 🙏

siriobalmelli commented 3 months ago

I've opened a branch in my nixpkgs fork to try and resolve this problem: https://github.com/siriobalmelli/nixpkgs/tree/sb/update/foundationdb

I added a foundationdb nixos test to reproduce the problem and then updated to 7.1.62 to see if the coredump persists there, which it unfortunately does:

$ nix build .#nixosTests.foundationdb -L

# ...

vm-test-run-foundationdb> server # [   40.424014] foundationdb-post-start[871]: SIGNAL: Segmentation fault (11)
vm-test-run-foundationdb> server # [   40.429551] foundationdb-post-start[871]: Trace: addr2line -e fdbcli.debug -p -C -f -i 0x74660c 0x73b83d 0x73d833 0x73d921 0x73de1b 0xc3b5e3 0x872a3b 0x62d28a 0x7f0167fad14e
vm-test-run-foundationdb> server # [   40.606745] systemd-coredump[875]: Process 871 (fdbcli) of user 0 terminated abnormally with signal 11/SEGV, processing...
vm-test-run-foundationdb> server # [   40.736776] foundationdb-start[862]: Time="1722928592.488564" Severity="10" LogGroup="default" Process="fdbmonitor": Started FoundationDB Process Monitor 7.1 (v7.1.62)
vm-test-run-foundationdb> server # [   40.754930] systemd[1]: Created slice Slice /system/systemd-coredump.
vm-test-run-foundationdb> server # [   40.763506] foundationdb-start[862]: Time="1722928592.522404" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf file /nix/store/vkmm55jclk270ggi1w18r9bjv85zia48-foundationdb.conf
vm-test-run-foundationdb> server # [   40.769800] foundationdb-start[862]: Time="1722928592.522762" Severity="10" LogGroup="default" Process="fdbmonitor": Watching conf dir /nix/store/ (2)
vm-test-run-foundationdb> server # [   40.771589] foundationdb-start[862]: Time="1722928592.530570" Severity="10" LogGroup="default" Process="fdbmonitor": Loading configuration /nix/store/vkmm55jclk270ggi1w18r9bjv85zia48-foundationdb.conf
vm-test-run-foundationdb> server # [   40.852369] systemd[1]: Started Process Core Dump (PID 875/UID 0).
vm-test-run-foundationdb> server # [   40.979857] foundationdb-start[862]: Time="1722928592.747738" Severity="10" LogGroup="default" Process="fdbmonitor": Starting backup_agent.1
vm-test-run-foundationdb> server # [   41.006276] foundationdb-start[862]: Time="1722928592.774779" Severity="10" LogGroup="default" Process="fdbmonitor": Starting fdbserver.4500
vm-test-run-foundationdb> server # [   41.038404] foundationdb-start[862]: Time="1722928592.803589" Severity="10" LogGroup="default" Process="fdbserver.4500": Launching /nix/store/sazi4qhlwvbdakgj651daylljcbqynn8-foundationdb-7.1.62/bin/fdbserver (882) for fdbserver.4500
vm-test-run-foundationdb> server # [   41.041907] foundationdb-start[862]: Time="1722928592.803969" Severity="10" LogGroup="default" Process="backup_agent.1": Launching /nix/store/sazi4qhlwvbdakgj651daylljcbqynn8-foundationdb-7.1.62/libexec/backup_agent (881) for backup_agent.1
vm-test-run-foundationdb> server # [   41.189271] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
vm-test-run-foundationdb> server # [   41.272880] ACPI: button: Power Button [PWRF]
vm-test-run-foundationdb> server # [   42.047897] foundationdb-start[862]: Time="1722928593.802180" Severity="40" LogGroup="default" Process="backup_agent.1": SIGNAL: Segmentation fault (11)
vm-test-run-foundationdb> server # [   42.051829] foundationdb-start[862]: Time="1722928593.802489" Severity="40" LogGroup="default" Process="backup_agent.1": Trace: addr2line -e backup_agent.debug -p -C -f -i 0x6cf50c 0x6c3e7d 0x6c5e73 0x6c5f61 0x6c645b 0xc932f3 0xc93730 0x8fb9e7 0x676b77 0x7f56d1e6414e
vm-test-run-foundationdb> server # [   42.222002] dhcpcd[637]: eth0: leased 10.0.2.15 for 86400 seconds
vm-test-run-foundationdb> server # [   42.236107] systemd-coredump[883]: Process 881 (backup_agent) of user 118 terminated abnormally with signal 11/SEGV, processing...
vm-test-run-foundationdb> server # [   42.246009] dhcpcd[637]: eth0: adding route to 10.0.2.0/24
vm-test-run-foundationdb> server # [   42.259630] dhcpcd[637]: eth0: adding default route via 10.0.2.2
vm-test-run-foundationdb> server # [   42.409620] foundationdb-start[862]: Time="1722928594.175923" Severity="40" LogGroup="default" Process="fdbserver.4500": SIGNAL: Segmentation fault (11)
vm-test-run-foundationdb> server # [   42.424375] foundationdb-start[862]: Time="1722928594.184442" Severity="40" LogGroup="default" Process="fdbserver.4500": Trace: addr2line -e fdbserver.debug -p -C -f -i 0x1e72e6c 0x1e91ccd 0x1e90713 0x1e90cfb 0x1e91b03 0x1e90713 0x1e90801 0x1e90cfb 0x24e03e3 0x1e7d45d 0x1e7eba4 0x1d5694c 0x13f6a06 0xe32d9a 0x7f37e475d14e
vm-test-run-foundationdb> server # [   42.506851] systemd[1]: Started Process Core Dump (PID 883/UID 0).
vm-test-run-foundationdb> server # [   42.619025] parport_pc 00:03: reported by Plug and Play ACPI
vm-test-run-foundationdb> server # [   42.651005] parport0: PC-style at 0x378, irq 7 [PCSPP(,...)]
vm-test-run-foundationdb> server # [   42.658571] Floppy drive(s): fd0 is 2.88M AMI BIOS
vm-test-run-foundationdb> server # [   42.699019] FDC 0 is a S82078B
vm-test-run-foundationdb> server # [   42.737988] systemd-coredump[885]: Process 882 (fdbserver) of user 118 terminated abnormally with signal 11/SEGV, processing...
vm-test-run-foundationdb> server # [   43.209668] systemd[1]: Started Process Core Dump (PID 885/UID 0).
vm-test-run-foundationdb> server # [   44.933779] systemd[1]: Stopped target Host and Network Name Lookups.
vm-test-run-foundationdb> server # [   44.951525] systemd[1]: Stopping Host and Network Name Lookups...
vm-test-run-foundationdb> server # [   44.962948] systemd[1]: Stopped target User and Group Name Lookups.
vm-test-run-foundationdb> server # [   44.977532] systemd[1]: Stopping User and Group Name Lookups...
vm-test-run-foundationdb> server # [   44.982730] systemd[1]: Stopping Name Service Cache Daemon (nsncd)...
vm-test-run-foundationdb> server # [   45.037346] systemd[1]: nscd.service: Deactivated successfully.
vm-test-run-foundationdb> server # [   45.072567] systemd[1]: Stopped Name Service Cache Daemon (nsncd).
vm-test-run-foundationdb> server # [   45.656987] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0x700, revision 0
vm-test-run-foundationdb> server # [   45.700864] systemd[1]: Starting Name Service Cache Daemon (nsncd)...
vm-test-run-foundationdb> server # [   46.327772] systemd[1]: Started DHCP Client.
vm-test-run-foundationdb> server # [   46.452386] systemd[1]: Reached target Network is Online.
vm-test-run-foundationdb> server # [   46.565201] bochs-drm 0000:00:02.0: vgaarb: deactivate vga console
vm-test-run-foundationdb> server # [   46.603928] Console: switching to colour dummy device 80x25
vm-test-run-foundationdb> server # [   46.633529] [drm] Found bochs VGA, ID 0xb0c5.
vm-test-run-foundationdb> server # [   46.633671] [drm] Framebuffer size 16384 kB @ 0xfd000000, mmio @ 0xfebd0000.
vm-test-run-foundationdb> server # [   46.707169] [drm] Found EDID data blob.
vm-test-run-foundationdb> server # [   46.779391] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on minor 0
vm-test-run-foundationdb> server # [   47.019703] fbcon: bochs-drmdrmfb (fb0) is primary device
vm-test-run-foundationdb> server # [   47.210910] Console: switching to colour frame buffer device 160x50
vm-test-run-foundationdb> server # [   47.242775] bochs-drm 0000:00:02.0: [drm] fb0: bochs-drmdrmfb frame buffer device
vm-test-run-foundationdb> server # [   47.255793] systemd[1]: Started Name Service Cache Daemon (nsncd).
vm-test-run-foundationdb> server # [   47.274854] nsncd[952]: Aug 06 07:16:38.667 INFO started, config: Config { ignored_request_types: {}, worker_count: 8, handoff_timeout: 3s }, path: "/var/run/nscd/socket"
vm-test-run-foundationdb> server # [   47.281697] systemd[1]: Reached target Host and Network Name Lookups.
vm-test-run-foundationdb> server # [   47.287476] systemd[1]: Reached target User and Group Name Lookups.
vm-test-run-foundationdb> server # [   47.978598] systemd-coredump[879]: Process 871 (fdbcli) of user 0 dumped core.
vm-test-run-foundationdb> server #
vm-test-run-foundationdb> server # Module libgcc_s.so.1 without build-id.
vm-test-run-foundationdb> server # Module libstdc++.so.6 without build-id.
vm-test-run-foundationdb> server # Module libboost_context.so.1.78.0 without build-id.
vm-test-run-foundationdb> server # Stack trace of thread 871:
vm-test-run-foundationdb> server # #0  0x0000000000d81a24 _ZNKSt7codecvtIDic11__mbstate_tE10do_unshiftERS0_PcS3_RS3_ (fdbcli + 0x981a24)
vm-test-run-foundationdb> server # #1  0x0000000000740471 _ZNSt8__detail15_BracketMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0EE8_M_readyEv (fdbcli + 0x340471)
vm-test-run-foundationdb> server # #2  0x000000000074660c _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE25_M_insert_bracket_matcherILb0ELb0EEEvb (fdbcli + 0x34660c)
vm-test-run-foundationdb> server # #3  0x000000000073b83d _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE7_M_atomEv (fdbcli + 0x33b83d)
vm-test-run-foundationdb> server # #4  0x000000000073d833 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbcli + 0x33d833)
vm-test-run-foundationdb> server # #5  0x000000000073d921 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbcli + 0x33d921)
vm-test-run-foundationdb> server # #6  0x000000000073de1b _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_disjunctionEv (fdbcli + 0x33de1b)
vm-test-run-foundationdb> server # #7  0x0000000000c3b5e3 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEEC2EPKcS6_RKSt6localeNSt15regex_constants18syntax_option_typeE.constprop.0 (fdbcli + 0x83b5e3)
vm-test-run-foundationdb> server # #8  0x0000000000872a3b _Z16setNetworkOptionN17FDBNetworkOptions6OptionE8OptionalI9StringRefE (fdbcli + 0x472a3b)
vm-test-run-foundationdb> server # #9  0x000000000062d28a main (fdbcli + 0x22d28a)
vm-test-run-foundationdb> server # #10 0x00007f0167fad14e __libc_start_call_main (libc.so.6 + 0x2a14e)
vm-test-run-foundationdb> server # #11 0x00007f0167fad209 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a209)
vm-test-run-foundationdb> server # #12 0x000000000064beb5 _start (fdbcli + 0x24beb5)
vm-test-run-foundationdb> server # ELF object binary architecture: AMD x86-64
vm-test-run-foundationdb> server #
vm-test-run-foundationdb> server # [   48.106673] systemd[1]: foundationdb.service: Control process exited, code=exited, status=139/n/a
vm-test-run-foundationdb> server # [   48.148817] foundationdb-start[862]: Time="1722928599.840708" Severity="20" LogGroup="default" Process="fdbmonitor": Received signal 15 (Terminated), shutting down
vm-test-run-foundationdb> server # [   48.155657] foundationdb-start[862]: Time="1722928599.840988" Severity="10" LogGroup="default" Process="fdbmonitor": Killing process 882
vm-test-run-foundationdb> server # [   48.161500] foundationdb-start[862]: Time="1722928599.848483" Severity="10" LogGroup="default" Process="fdbmonitor": Killing process 881
vm-test-run-foundationdb> server # [   48.172979] foundationdb-post-start[863]: /nix/store/qdc2kp143rs9chqx2zf9fpa9cxxx71sr-unit-script-foundationdb-post-start/bin/foundationdb-post-start: line 6:   871 Segmentation fault      (core dumped) fdbcli --exec "configure new single ssd"
vm-test-run-foundationdb> server # [   48.179632] systemd[1]: systemd-coredump@0-875-0.service: Deactivated successfully.
vm-test-run-foundationdb> server # [   48.180777] systemd[1]: systemd-coredump@0-875-0.service: Consumed 1.200s CPU time, 15.6M memory peak.
vm-test-run-foundationdb> server # [   48.567642] input: QEMU Virtio Keyboard as /devices/pci0000:00/0000:00:0a.0/virtio7/input/input4
vm-test-run-foundationdb> server # [   49.411879] systemd-coredump[886]: Process 881 (backup_agent) of user 118 dumped core.
vm-test-run-foundationdb> server #
vm-test-run-foundationdb> server # Module libgcc_s.so.1 without build-id.
vm-test-run-foundationdb> server # Module libstdc++.so.6 without build-id.
vm-test-run-foundationdb> server # Module libboost_context.so.1.78.0 without build-id.
vm-test-run-foundationdb> server # Stack trace of thread 881:
vm-test-run-foundationdb> server # #0  0x0000000000e499d4 _ZNKSt7codecvtIDic11__mbstate_tE10do_unshiftERS0_PcS3_RS3_ (backup_agent + 0xa499d4)
vm-test-run-foundationdb> server # #1  0x00000000006c8f61 _ZNSt8__detail15_BracketMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0EE8_M_readyEv (backup_agent + 0x2c8f61)
vm-test-run-foundationdb> server # #2  0x00000000006cf50c _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE25_M_insert_bracket_matcherILb0ELb0EEEvb (backup_agent + 0x2cf50c)
vm-test-run-foundationdb> server # #3  0x00000000006c3e7d _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE7_M_atomEv (backup_agent + 0x2c3e7d)
vm-test-run-foundationdb> server # #4  0x00000000006c5e73 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (backup_agent + 0x2c5e73)
vm-test-run-foundationdb> server # #5  0x00000000006c5f61 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (backup_agent + 0x2c5f61)
vm-test-run-foundationdb> server # #6  0x00000000006c645b _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_disjunctionEv (backup_agent + 0x2c645b)
vm-test-run-foundationdb> server # #7  0x0000000000c932f3 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEEC2EPKcS6_RKSt6localeNSt15regex_constants18syntax_option_typeE.constprop.0 (backup_agent + 0x8932f3)
vm-test-run-foundationdb> server # #8  0x0000000000c93730 _ZNSt7__cxx1111basic_regexIcNS_12regex_traitsIcEEE10_M_compileEPKcS5_NSt15regex_constants18syntax_option_typeE.constprop.0 (backup_agent + 0x893730)
vm-test-run-foundationdb> server # #9  0x00000000008fb9e7 _Z16setNetworkOptionN17FDBNetworkOptions6OptionE8OptionalI9StringRefE (backup_agent + 0x4fb9e7)
vm-test-run-foundationdb> server # #10 0x0000000000676b77 main (backup_agent + 0x276b77)
vm-test-run-foundationdb> server # #11 0x00007f56d1e6414e __libc_start_call_main (libc.so.6 + 0x2a14e)
vm-test-run-foundationdb> server # #12 0x00007f56d1e64209 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a209)
vm-test-run-foundationdb> server # #13 0x0000000000696fa5 _start (backup_agent + 0x296fa5)
vm-test-run-foundationdb> server # ELF object binary architecture: AMD x86-64
vm-test-run-foundationdb> server #
vm-test-run-foundationdb> server # [   49.489633] systemd[1]: systemd-coredump@1-883-0.service: Deactivated successfully.
vm-test-run-foundationdb> server # [   49.504872] systemd[1]: systemd-coredump@1-883-0.service: Consumed 1.144s CPU time, 14.4M memory peak.
vm-test-run-foundationdb> server # [   50.485223] cryptd: max_cpu_qlen set to 1000
vm-test-run-foundationdb> server # [   51.040473] AVX2 version of gcm_enc/dec engaged.
vm-test-run-foundationdb> server # [   51.041445] AES CTR mode by8 optimization enabled
vm-test-run-foundationdb> server # [   51.113593] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input5
vm-test-run-foundationdb> server # [   51.942540] systemd[1]: Starting Virtual Console Setup...
vm-test-run-foundationdb> server # [   53.041980] systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
vm-test-run-foundationdb> server # [   54.585958] systemd-logind[686]: Watching system buttons on /dev/input/event0 (AT Translated Set 2 keyboard)
vm-test-run-foundationdb> server # [   55.497838] systemd[1]: Finished Virtual Console Setup.
vm-test-run-foundationdb> server # [   55.564383] systemd-coredump[897]: Process 882 (fdbserver) of user 118 dumped core.
vm-test-run-foundationdb> server #
vm-test-run-foundationdb> server # Module libgcc_s.so.1 without build-id.
vm-test-run-foundationdb> server # Module libstdc++.so.6 without build-id.
vm-test-run-foundationdb> server # Module libboost_context.so.1.78.0 without build-id.
vm-test-run-foundationdb> server # Stack trace of thread 882:
vm-test-run-foundationdb> server # #0  0x0000000002b476a4 _ZNKSt7codecvtIDic11__mbstate_tE10do_unshiftERS0_PcS3_RS3_ (fdbserver + 0x27476a4)
vm-test-run-foundationdb> server # #1  0x0000000001e72ad0 _ZNSt8__detail15_BracketMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0EE8_M_readyEv (fdbserver + 0x1a72ad0)
vm-test-run-foundationdb> server # #2  0x0000000001e72e6c _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE25_M_insert_bracket_matcherILb0ELb0EEEvb (fdbserver + 0x1a72e6c)
vm-test-run-foundationdb> server # #3  0x0000000001e91ccd _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE7_M_atomEv (fdbserver + 0x1a91ccd)
vm-test-run-foundationdb> server # #4  0x0000000001e90713 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbserver + 0x1a90713)
vm-test-run-foundationdb> server # #5  0x0000000001e90cfb _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_disjunctionEv (fdbserver + 0x1a90cfb)
vm-test-run-foundationdb> server # #6  0x0000000001e91b03 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE7_M_atomEv (fdbserver + 0x1a91b03)
vm-test-run-foundationdb> server # #7  0x0000000001e90713 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbserver + 0x1a90713)
vm-test-run-foundationdb> server # #8  0x0000000001e90801 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_alternativeEv (fdbserver + 0x1a90801)
vm-test-run-foundationdb> server # #9  0x0000000001e90cfb _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEE14_M_disjunctionEv (fdbserver + 0x1a90cfb)
vm-test-run-foundationdb> server # #10 0x00000000024e03e3 _ZNSt8__detail9_CompilerINSt7__cxx1112regex_traitsIcEEEC2EPKcS6_RKSt6localeNSt15regex_constants18syntax_option_typeE.constprop.0 (fdbserver + 0x20e03e3)
vm-test-run-foundationdb> server # #11 0x0000000001e7d45d _ZN8Hostname10isHostnameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (fdbserver + 0x1a7d45d)
vm-test-run-foundationdb> server # #12 0x0000000001e7eba4 _ZN23ClusterConnectionStringC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (fdbserver + 0x1a7eba4)
vm-test-run-foundationdb> server # #13 0x0000000001d5694c _ZN21ClusterConnectionFileC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (fdbserver + 0x195694c)
vm-test-run-foundationdb> server # #14 0x00000000013f6a06 _ZN12_GLOBAL__N_110CLIOptions17parseArgsInternalEiPPc (fdbserver + 0xff6a06)
vm-test-run-foundationdb> server # #15 0x0000000000e32d9a main (fdbserver + 0xa32d9a)
vm-test-run-foundationdb> server # #16 0x00007f37e475d14e __libc_start_call_main (libc.so.6 + 0x2a14e)
vm-test-run-foundationdb> server # #17 0x00007f37e475d209 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2a209)
vm-test-run-foundationdb> server # #18 0x0000000000e87055 _start (fdbserver + 0xa87055)
vm-test-run-foundationdb> server # ELF object binary architecture: AMD x86-64
vm-test-run-foundationdb> server #
vm-test-run-foundationdb> server # [   55.659205] systemd[1]: foundationdb.service: Failed with result 'exit-code'.
vm-test-run-foundationdb> server # [   55.676850] systemd[1]: Failed to start FoundationDB Service.
vm-test-run-foundationdb> server # [   55.682761] systemd[1]: foundationdb.service: Consumed 1.342s CPU time, 14.8M memory peak, 4K written to disk.
vm-test-run-foundationdb> server # [   55.744923] systemd[1]: systemd-coredump@2-885-0.service: Deactivated successfully.
vm-test-run-foundationdb> server # [   55.756503] systemd[1]: systemd-coredump@2-885-0.service: Consumed 2.753s CPU time, 40.7M memory peak.
vm-test-run-foundationdb> server # [   55.905772] systemd[1]: Reached target Multi-User System.
vm-test-run-foundationdb> server # [   55.912323] systemd[1]: Startup finished in 13.895s (kernel) + 42.015s (userspace) = 55.910s.
vm-test-run-foundationdb> server # [   56.008406] kvm_amd: Nested Virtualization enabled
vm-test-run-foundationdb> server # [   56.008680] kvm_amd: Nested Paging enabled
vm-test-run-foundationdb> server # [   56.014275] kvm_amd: Virtual GIF supported
vm-test-run-foundationdb> server # [   56.014520] kvm_amd: PMU virtualization is disabled
vm-test-run-foundationdb> server # [   56.244989] systemd-logind[686]: Watching system buttons on /dev/input/event2 (Power Button)
vm-test-run-foundationdb> server # [   56.351882] ppdev: user-space parallel port driver
vm-test-run-foundationdb> server # [   56.797433] systemd[1]: systemd-vconsole-setup.service: Deactivated successfully.
vm-test-run-foundationdb> server # [   56.803983] systemd[1]: Stopped Virtual Console Setup.
vm-test-run-foundationdb> server # [   56.815497] systemd[1]: Stopping Virtual Console Setup...
vm-test-run-foundationdb> server # [   56.861570] systemd-logind[686]: Watching system buttons on /dev/input/event3 (QEMU Virtio Keyboard)
vm-test-run-foundationdb> server # [   56.875493] systemd[1]: Starting Virtual Console Setup...
vm-test-run-foundationdb> server # [   56.937368] systemd[1]: run-credentials-systemd\x2dvconsole\x2dsetup.service.mount: Deactivated successfully.
vm-test-run-foundationdb> server # [   57.124366] systemd[1]: systemd-vconsole-setup.service: Deactivated successfully.
vm-test-run-foundationdb> server # [   57.131281] systemd[1]: Stopped Virtual Console Setup.
vm-test-run-foundationdb> cleanup
vm-test-run-foundationdb> kill machine (pid 7)
vm-test-run-foundationdb> qemu-kvm: terminating on signal 15 from pid 4 (/nix/store/l014xp1qxdl6gim3zc0jv3mpxhbp346s-python3-3.12.4/bin/python3.12)
vm-test-run-foundationdb> (finished: cleanup, in 0.00 seconds)
vm-test-run-foundationdb> Traceback (most recent call last):
vm-test-run-foundationdb>   File "/nix/store/9bk9r3ymmzxmbwj9yrfa34csdps2689l-nixos-test-driver-1.1/bin/.nixos-test-driver-wrapped", line 9, in <module>
vm-test-run-foundationdb>     sys.exit(main())
vm-test-run-foundationdb>              ^^^^^^
vm-test-run-foundationdb>   File "/nix/store/9bk9r3ymmzxmbwj9yrfa34csdps2689l-nixos-test-driver-1.1/lib/python3.12/site-packages/test_driver/__init__.py", line 146, in main
vm-test-run-foundationdb>     driver.run_tests()
vm-test-run-foundationdb>   File "/nix/store/9bk9r3ymmzxmbwj9yrfa34csdps2689l-nixos-test-driver-1.1/lib/python3.12/site-packages/test_driver/driver.py", line 166, in run_tests
vm-test-run-foundationdb>     self.test_script()
vm-test-run-foundationdb>   File "/nix/store/9bk9r3ymmzxmbwj9yrfa34csdps2689l-nixos-test-driver-1.1/lib/python3.12/site-packages/test_driver/driver.py", line 158, in test_script
vm-test-run-foundationdb>     exec(self.tests, symbols, None)
vm-test-run-foundationdb>   File "<string>", line 1, in <module>
vm-test-run-foundationdb>   File "/nix/store/9bk9r3ymmzxmbwj9yrfa34csdps2689l-nixos-test-driver-1.1/lib/python3.12/site-packages/test_driver/machine.py", line 374, in wait_for_unit
vm-test-run-foundationdb>     retry(check_active, timeout)
vm-test-run-foundationdb>   File "/nix/store/9bk9r3ymmzxmbwj9yrfa34csdps2689l-nixos-test-driver-1.1/lib/python3.12/site-packages/test_driver/machine.py", line 129, in retry
vm-test-run-foundationdb>     if fn(False):
vm-test-run-foundationdb>        ^^^^^^^^^
vm-test-run-foundationdb>   File "/nix/store/9bk9r3ymmzxmbwj9yrfa34csdps2689l-nixos-test-driver-1.1/lib/python3.12/site-packages/test_driver/machine.py", line 357, in check_active
vm-test-run-foundationdb>     raise Exception(f'unit "{unit}" reached state "{state}"')
vm-test-run-foundationdb> Exception: unit "foundationdb.service" reached state "failed"
vm-test-run-foundationdb> kill vlan (pid 5)
andersstorhaug commented 3 months ago

I also tested locally against 7.1.62, with the same error.

FoundationDB 7.3.43 (pre-release, not my repo) does not appear to have this issue, if that helps.

andersstorhaug commented 3 months ago

Side note, apparently Snowflake is using 7.3.43 in production -- it appears to be, unofficially, the latest stable release.

Hopefully we get a little clarity around that from the FDB folks moving forward, but perhaps NixPkgs could be updated to this version sometime soon.