freifunk-gluon / gluon

a modular framework for creating OpenWrt-based firmwares for wireless mesh nodes
https://gluon.readthedocs.io
Other
551 stars 325 forks source link

gluon-respondd crashes on NanoPi R2S, gluon-next #2173

Closed goligo closed 3 years ago

goligo commented 3 years ago

Bug report

What is the problem? gluon-respondd reproducibly crashes after 4 hours. As soon as it is restarted, it is working again, for another 4 hours. At the time gluon-respondd stops working the logfile shows:

Fri Jan 8 20:54:39 2021 daemon.info procd: Instance gluon-respondd::instance1 s in a crash loop 6 crashes, 2757 seconds since last crash

What is the expected behaviour? gluon-respondd should not crash, or at least fail gracefully and restart automatically.

Gluon Version: gluon-v2020.2-183-g45217767

Site Configuration: https://github.com/freifunkMUC/site-ffm (except for the 0003-batman-adv patch, which is no longer needed, as it is already contained)

Custom patches: I was making a custom build of our firmware for NanoPi R2S. Except for removing the batman-adv patch, I didn't need to change anything, but the gluon-branch to "next", to get a working image.

mweinelt commented 3 years ago

Fri Jan 8 20:54:39 2021 daemon.info procd: Instance gluon-respondd::instance1 s in a crash loop 6 crashes, 2757 seconds since last crash

That's unfortunately not really helpful, as it only cofims that respondd crashed, but not why.

I'm not sure who of us actually runs next currently, maybe @blocktrron can say if they have seen such a problem.

goligo commented 3 years ago

So how can I get more verbose output for gluon-respondd?

blocktrron commented 3 years ago

This is not a universal issue, as I'm running next w/o issues on a C20i. However, these reports already came up specific to the NanoPi R2S-

I didn't have a look, so I can't say much about the root cause. I suspect there's an edge-case regarding devices w/o WiFi, however it might also be an ARM8 specific problem.

Keep in mind: next is unsupported and only meant as a technology preview. I'd normally close this issue, however as this would be nice to have looked at / fixed prior to updating the OpenWrt base, I'll leave this open.

neocturne commented 3 years ago

Given that respondd is a network-facing service and a crash might well hint at some kind of memory issue, this should have high priority - depending on the exact bug it might turn out to be exploitable.

The next step would be to get a core dump of the crash. The following patch will enable coredumps for respondd:

diff --git a/package/gluon-respondd/files/etc/init.d/gluon-respondd b/package/gluon-respondd/files/etc/init.d/gluon-respondd
index c7b071eb2e5a..e7b258056364 100755
--- a/package/gluon-respondd/files/etc/init.d/gluon-respondd
+++ b/package/gluon-respondd/files/etc/init.d/gluon-respondd
@@ -16,6 +16,7 @@ start_service() {
    procd_set_param command $DAEMON -d /usr/lib/respondd -p 1001 -g ff02::2:1001 $meshdevs -g ff05::2:1001 $clientdevs
    procd_set_param respawn ${respawn_threshold:-3600} ${respawn_timeout:-5} ${respawn_retry:-5}
    procd_set_param stderr 1
+   procd_set_param limits core="unlimited"
    procd_close_instance
 }

After a crash, the dump can be found in /tmp.

goligo commented 3 years ago

Tried running gluon-respondd on the command line and wait for output. Wasn't very helpful ;-)

~# /usr/bin/respondd -d /usr/lib/respondd -p 1001 -g ff02::2:1001 -i vx_mesh_lan -i mesh-vpn -g ff05::2:1001 -i br-client -t 10 Segmentation fault

I have now enabled the core dump, will post it in about 4 hours, when it is available.

neocturne commented 3 years ago

Could you also describe in more detail how you built your image? I tried to use the stable branch of the site repo, but the patch batman-v-respondd-ptq1.patch in site/patches is not applicable to the next branch. The patching code in the site Makefile has a few issues, and I'm not completely sure how it's supposed to work...

In any case, an issue with one of the FFM-specific respondd patches is the most likely cause, as it seems nobody else has encountered the problem.

goligo commented 3 years ago

For me only 0003-use-wifi-tx-bitrate-as-fallback-throughput.patch failed, as it is already contained in batman-adv 2020.4 and needs to be deleted before building.

To build use

make GLUON_TARGETS=rockchip-armv8 GLUON_GIT_REF=next

blocktrron commented 3 years ago

I'm able to reproduce this crash, however still only on the NanoPi R2S:

(gdb) bt full
#0  strlen (s=s@entry=0xffffffffb2794460 <error: Cannot access memory at address 0xffffffffb2794460>) at src/string/strlen.c:17
        a = 0xffffffffb2794460 <error: Cannot access memory at address 0xffffffffb2794460>
        w = 0xffffffffb2794460
#1  0x0000ffffb26e8564 in json_object_new_string (s=s@entry=0xffffffffb2794460 <error: Cannot access memory at address 0xffffffffb2794460>)
    at /media/dbauer/4e292785-2843-4894-ae0b-47d6109969b5/gluon/openwrt/build_dir/target-aarch64_generic_musl/json-c-0.15/json_object.c:1290
No locals.
#2  0x0000ffffb266ca34 in gluonutil_wrap_string (str=str@entry=0xffffffffb2794460 <error: Cannot access memory at address 0xffffffffb2794460>)
    at /media/dbauer/4e292785-2843-4894-ae0b-47d6109969b5/gluon/openwrt/build_dir/target-aarch64_generic_musl/libgluonutil-1/libgluonutil.c:209
No locals.
#3  0x0000ffffb266ca54 in gluonutil_wrap_and_free_string (str=0xffffffffb2794460 <error: Cannot access memory at address 0xffffffffb2794460>)
    at /media/dbauer/4e292785-2843-4894-ae0b-47d6109969b5/gluon/openwrt/build_dir/target-aarch64_generic_musl/libgluonutil-1/libgluonutil.c:213
        ret = <optimized out>
#4  0x0000ffffb25ae680 in ?? ()
No symbol table info available.
#5  0x0000000019589060 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
neocturne commented 3 years ago

@blocktrron Hmm, seems we're missing the symbol table of the object calling gluonutil_wrap_and_free_string with invalid data. info proc mappings should tell you which object that is.

blocktrron commented 3 years ago

The GDB from the snapshot toolchain didn't automatically load shared libarys for some reason. Anyways, here's the full trace:

(gdb) bt full
#0  strlen (s=s@entry=0xffffffffb2794460 <error: Cannot access memory at address 0xffffffffb2794460>)
    at src/string/strlen.c:17
        a = 0xffffffffb2794460 <error: Cannot access memory at address 0xffffffffb2794460>
        w = 0xffffffffb2794460
#1  0x0000ffffb26e8564 in json_object_new_string (
    s=s@entry=0xffffffffb2794460 <error: Cannot access memory at address 0xffffffffb2794460>)
    at /media/dbauer/4e292785-2843-4894-ae0b-47d6109969b5/gluon/openwrt/build_dir/target-aarch64_generic_musl/json-c-0.15/json_object.c:1290
No locals.
#2  0x0000ffffb266ca34 in gluonutil_wrap_string (
    str=str@entry=0xffffffffb2794460 <error: Cannot access memory at address 0xffffffffb2794460>)
    at /media/dbauer/4e292785-2843-4894-ae0b-47d6109969b5/gluon/openwrt/build_dir/target-aarch64_generic_musl/libgluonutil-1/libgluonutil.c:209
No locals.
#3  0x0000ffffb266ca54 in gluonutil_wrap_and_free_string (
    str=0xffffffffb2794460 <error: Cannot access memory at address 0xffffffffb2794460>)
    at /media/dbauer/4e292785-2843-4894-ae0b-47d6109969b5/gluon/openwrt/build_dir/target-aarch64_generic_musl/libgluonutil-1/libgluonutil.c:213
        ret = <optimized out>
#4  0x0000ffffb25ae680 in get_primary_domain_code () at respondd-nodeinfo.c:68
No locals.
#5  respondd_provider_nodeinfo () at respondd-nodeinfo.c:133
        ret = 0xffffb2702880
        hardware = <optimized out>
        model = <optimized out>
        network = <optimized out>
        software = <optimized out>
        software_firmware = <optimized out>
        system = 0x1959cf80
#6  0x0000000000402b9c in eval_providers (providers=0x417dc0)
    at /media/dbauer/4e292785-2843-4894-ae0b-47d6109969b5/gluon/openwrt/build_dir/target-aarch64_generic_musl/respondd-1/respondd.c:357
        ret = 0x19589060
        ret = <optimized out>
#7  single_request (type=type@entry=0x19588c30 "nodeinfo")
    at /media/dbauer/4e292785-2843-4894-ae0b-47d6109969b5/gluon/openwrt/build_dir/target-aarch64_generic_musl/respondd-1/respondd.c:386
        key = {key = <optimized out>, data = <optimized out>}
        entry = 0xffffb278f540
        r = 0xffffb26e16a0
        ret = <optimized out>
#8  0x0000000000404c3c in multi_request (types=0x19588c30 "nodeinfo")
    at /media/dbauer/4e292785-2843-4894-ae0b-47d6109969b5/gluon/openwrt/build_dir/target-aarch64_generic_musl/respondd-1/respondd.c:411
        sub = <optimized out>
        ret = 0x1959c400
        type = 0x19588c30 "nodeinfo"
        saveptr = 0x19588c39 "statistics neighbours"
        ret = <optimized out>
        type = <optimized out>
        saveptr = <optimized out>
        sub = <optimized out>
#9  handle_request (compress=<synthetic pointer>, request=0x19588c2c "GET nodeinfo")
    at /media/dbauer/4e292785-2843-4894-ae0b-47d6109969b5/gluon/openwrt/build_dir/target-aarch64_generic_musl/respondd-1/respondd.c:436
blocktrron commented 3 years ago

I'm able to reliably trigger the crash in < 1 minute with this script by requesting nodeinfo respondd object, however this is only the case for rockchip /arm64. Neither arm (ipq40xx) or mips (ramips-mt7620) lead to a crash.

from socket import socket, AF_INET6, SOCK_DGRAM
import zlib
import sys

i=0
while True:
    i+=1
    s = socket(AF_INET6, SOCK_DGRAM)
    s.bind(('::', 14233))
    s.sendto("GET {name}".format(name=sys.argv[2]).encode(), (sys.argv[1], 1001))
    data, addr = s.recvfrom(2048)
    d = zlib.decompress(data, wbits=-zlib.MAX_WBITS)
    print("pass {}".format(i))
    print(d.decode())
goligo commented 3 years ago

Ah, da ist sowieso schon ein automatischer Restart eingebaut, falls der Prozess crasht. In Wirklichkeit crasht er nicht nach 4 Stunden, sondern schon nach 40 Minuten. Wenn die 5 Retrys aufgebraucht sind, ist er 6 Mal gelaufen, daher kommen die 4 Stunden.

Adorfer commented 3 years ago

So it's more an issue specificly of NanoPI (ARM/Rockchip architecture) and not all Gluons of that build?

neocturne commented 3 years ago

@blocktrron Were you able to find out anything, or can you upload the coredump + relevant binaries somewhere so I can have a look?

blocktrron commented 3 years ago

respondd-core.tar.gz

I didn't invest any additional time here. To me, the code itself should not return a pointer to a non-existent memory region.

goligo commented 3 years ago

@Adorfer For now it seems the issue does only occur on the NanoPI R2S (rockchip-armv8), it did not reproduce on any other platform yet.

@NeoRaider The issue seems to be related to this change, which was done about half a year ago. In our firmware all domains are primary domains, there aren't any symlinks in /lib/gluon/domains. https://github.com/freifunk-gluon/gluon/commit/bcf57467dd4548135db507dceded3148cd0fc941

goligo commented 3 years ago

The issue is NOT specific to rockchip-armv8, I could also reproduce on x86-64. Interestingly, the respondd is living more than 3600 seconds before crashing on x86-64, so it is just restarted again and never stops working. I just found the core dumps.

neocturne commented 3 years ago

@blocktrron Hmm, there are no anonymous mappings in the coredumps - what's the value of /proc/$(pidof respondd)/coredump_filter?

I guess I'll try to reproduce this on x86-86 sometime in the next days.

blocktrron commented 3 years ago

Looks okay:

root@64367-nanopi-r2s:/proc/7121# cat coredump_filter 
00000023
blocktrron commented 3 years ago

Just had a look, aren't the mappings in question there?

(gdb) maintenance info sections
Exec file:
    `/home/dbauer/git/freifunk/respondd-core/respondd', file type elf64-littleaarch64.
 [0]      0x00400200->0x0040021a at 0x00000200: .interp ALLOC LOAD READONLY DATA HAS_CONTENTS
 [1]      0x00400220->0x0040047c at 0x00000220: .hash ALLOC LOAD READONLY DATA HAS_CONTENTS
 [2]      0x00400480->0x00400534 at 0x00000480: .gnu.hash ALLOC LOAD READONLY DATA HAS_CONTENTS
 [3]      0x00400538->0x00400ce8 at 0x00000538: .dynsym ALLOC LOAD READONLY DATA HAS_CONTENTS
 [4]      0x00400ce8->0x00401012 at 0x00000ce8: .dynstr ALLOC LOAD READONLY DATA HAS_CONTENTS
 [5]      0x00401012->0x004010b6 at 0x00001012: .gnu.version ALLOC LOAD READONLY DATA HAS_CONTENTS
 [6]      0x004010b8->0x004010f8 at 0x000010b8: .gnu.version_r ALLOC LOAD READONLY DATA HAS_CONTENTS
 [7]      0x004010f8->0x00401188 at 0x000010f8: .rela.dyn ALLOC LOAD READONLY DATA HAS_CONTENTS
 [8]      0x00401188->0x00401770 at 0x00001188: .rela.plt ALLOC LOAD READONLY DATA HAS_CONTENTS
 [9]      0x00401770->0x00401780 at 0x00001770: .init ALLOC LOAD READONLY CODE HAS_CONTENTS
 [10]     0x00401780->0x00401b90 at 0x00001780: .plt ALLOC LOAD READONLY CODE HAS_CONTENTS
 [11]     0x00401b90->0x00404c78 at 0x00001b90: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
 [12]     0x00404c78->0x00404c88 at 0x00004c78: .fini ALLOC LOAD READONLY CODE HAS_CONTENTS
 [13]     0x00404c88->0x00405a0e at 0x00004c88: .rodata ALLOC LOAD READONLY DATA HAS_CONTENTS
 [14]     0x00405a10->0x00405aa4 at 0x00005a10: .eh_frame_hdr ALLOC LOAD READONLY DATA HAS_CONTENTS
 [15]     0x00405aa8->0x00405dc0 at 0x00005aa8: .eh_frame ALLOC LOAD READONLY DATA HAS_CONTENTS
 [16]     0x00416ba0->0x00416ba8 at 0x00006ba0: .init_array ALLOC LOAD DATA HAS_CONTENTS
 [17]     0x00416ba8->0x00416bb0 at 0x00006ba8: .fini_array ALLOC LOAD DATA HAS_CONTENTS
 [18]     0x00416bb0->0x00416bc8 at 0x00006bb0: .data.rel.ro ALLOC LOAD DATA HAS_CONTENTS
 [19]     0x00416bc8->0x00416de8 at 0x00006bc8: .dynamic ALLOC LOAD DATA HAS_CONTENTS
 [20]     0x00416de8->0x00417000 at 0x00006de8: .got ALLOC LOAD DATA HAS_CONTENTS
 [21]     0x00417000->0x00417008 at 0x00007000: .data ALLOC LOAD DATA HAS_CONTENTS
 [22]     0x00417008->0x00417070 at 0x00007008: .bss ALLOC
 [23]     0x00000000->0x00000033 at 0x00007008: .comment READONLY HAS_CONTENTS
 [24]     0x00000000->0x000000f0 at 0x00007040: .debug_aranges READONLY HAS_CONTENTS
 [25]     0x00000000->0x000052d2 at 0x00007130: .debug_info READONLY HAS_CONTENTS
 [26]     0x00000000->0x000007b0 at 0x0000c402: .debug_abbrev READONLY HAS_CONTENTS
 [27]     0x00000000->0x0000335e at 0x0000cbb2: .debug_line READONLY HAS_CONTENTS
 [28]     0x00000000->0x00000028 at 0x0000ff10: .debug_frame READONLY HAS_CONTENTS
 [29]     0x00000000->0x0000ef88 at 0x0000ff38: .debug_str READONLY HAS_CONTENTS
 [30]     0x00000000->0x00006b22 at 0x0001eec0: .debug_loc READONLY HAS_CONTENTS
 [31]     0x00000000->0x00001500 at 0x000259f0: .debug_ranges READONLY HAS_CONTENTS
 [32]     0x00000000->0x00003f82 at 0x00026ef0: .debug_macro READONLY HAS_CONTENTS
Core file:
    `/home/dbauer/git/freifunk/respondd-core/respondd.1610225041.19355.11.core', file type elf64-littleaarch64.
 [0]      0x00000000->0x00001080 at 0x00000c80: note0 READONLY HAS_CONTENTS
 [1]      0x00000000->0x00000110 at 0x00000d04: .reg/19355 HAS_CONTENTS
 [2]      0x00000000->0x00000110 at 0x00000d04: .reg HAS_CONTENTS
 [3]      0x00000000->0x00000080 at 0x00000ecc: .note.linuxcore.siginfo/19355 HAS_CONTENTS
 [4]      0x00000000->0x00000080 at 0x00000ecc: .note.linuxcore.siginfo HAS_CONTENTS
 [5]      0x00000000->0x00000150 at 0x00000f60: .auxv HAS_CONTENTS
 [6]      0x00000000->0x000009e1 at 0x000010c4: .note.linuxcore.file/19355 HAS_CONTENTS
 [7]      0x00000000->0x000009e1 at 0x000010c4: .note.linuxcore.file HAS_CONTENTS
 [8]      0x00000000->0x00000210 at 0x00001abc: .reg2/19355 HAS_CONTENTS
 [9]      0x00000000->0x00000210 at 0x00001abc: .reg2 HAS_CONTENTS
 [10]     0x00000000->0x00000008 at 0x00001ce0: .reg-aarch-tls/19355 HAS_CONTENTS
 [11]     0x00000000->0x00000008 at 0x00001ce0: .reg-aarch-tls HAS_CONTENTS
 [12]     0x00400000->0x00406000 at 0x00002000: load1 ALLOC READONLY CODE
 [13]     0x00416000->0x00417000 at 0x00002000: load2 ALLOC LOAD READONLY HAS_CONTENTS
 [14]     0x00417000->0x00418000 at 0x00003000: load3 ALLOC LOAD HAS_CONTENTS
 [15]     0x011a5000->0x01213000 at 0x00004000: load4 ALLOC LOAD HAS_CONTENTS
 [16]     0xffffb199f000->0xffffb19b4000 at 0x00072000: load5 ALLOC READONLY CODE
 [17]     0xffffb19b4000->0xffffb19b5000 at 0x00072000: load6 ALLOC LOAD READONLY HAS_CONTENTS
 [18]     0xffffb19b5000->0xffffb19b6000 at 0x00073000: load7 ALLOC LOAD HAS_CONTENTS
 [19]     0xffffb19b6000->0xffffb19d5000 at 0x00074000: load8 ALLOC READONLY CODE
 [20]     0xffffb19d5000->0xffffb19d6000 at 0x00074000: load9 ALLOC LOAD READONLY HAS_CONTENTS
 [21]     0xffffb19d6000->0xffffb19d7000 at 0x00075000: load10 ALLOC LOAD HAS_CONTENTS
 [22]     0xffffb19d7000->0xffffb19e7000 at 0x00076000: load11 ALLOC READONLY CODE
 [23]     0xffffb19e7000->0xffffb19e8000 at 0x00076000: load12 ALLOC LOAD READONLY HAS_CONTENTS
 [24]     0xffffb19e8000->0xffffb19e9000 at 0x00077000: load13 ALLOC LOAD HAS_CONTENTS
 [25]     0xffffb19e9000->0xffffb19fb000 at 0x00078000: load14 ALLOC READONLY CODE
 [26]     0xffffb19fb000->0xffffb19fc000 at 0x00078000: load15 ALLOC LOAD READONLY HAS_CONTENTS
 [27]     0xffffb19fc000->0xffffb19fd000 at 0x00079000: load16 ALLOC LOAD HAS_CONTENTS
 [28]     0xffffb19fd000->0xffffb1a0e000 at 0x0007a000: load17 ALLOC READONLY CODE
 [29]     0xffffb1a0e000->0xffffb1a0f000 at 0x0007a000: load18 ALLOC LOAD READONLY HAS_CONTENTS
 [30]     0xffffb1a0f000->0xffffb1a10000 at 0x0007b000: load19 ALLOC LOAD HAS_CONTENTS
 [31]     0xffffb1a10000->0xffffb1a22000 at 0x0007c000: load20 ALLOC READONLY CODE
 [32]     0xffffb1a22000->0xffffb1a23000 at 0x0007c000: load21 ALLOC LOAD READONLY HAS_CONTENTS
 [33]     0xffffb1a23000->0xffffb1a24000 at 0x0007d000: load22 ALLOC LOAD HAS_CONTENTS
 [34]     0xffffb1a24000->0xffffb1a34000 at 0x0007e000: load23 ALLOC READONLY CODE
 [35]     0xffffb1a34000->0xffffb1a35000 at 0x0007e000: load24 ALLOC LOAD READONLY HAS_CONTENTS
 [36]     0xffffb1a35000->0xffffb1a36000 at 0x0007f000: load25 ALLOC LOAD HAS_CONTENTS
 [37]     0xffffb1a36000->0xffffb1a4f000 at 0x00080000: load26 ALLOC READONLY CODE
 [38]     0xffffb1a4f000->0xffffb1a50000 at 0x00080000: load27 ALLOC LOAD READONLY HAS_CONTENTS
 [39]     0xffffb1a50000->0xffffb1a51000 at 0x00081000: load28 ALLOC LOAD HAS_CONTENTS
 [40]     0xffffb1a51000->0xffffb1a63000 at 0x00082000: load29 ALLOC READONLY CODE
 [41]     0xffffb1a63000->0xffffb1a64000 at 0x00082000: load30 ALLOC LOAD READONLY HAS_CONTENTS
 [42]     0xffffb1a64000->0xffffb1a65000 at 0x00083000: load31 ALLOC LOAD HAS_CONTENTS
 [43]     0xffffb1a65000->0xffffb1a81000 at 0x00084000: load32 ALLOC READONLY CODE
 [44]     0xffffb1a81000->0xffffb1a82000 at 0x00084000: load33 ALLOC LOAD READONLY HAS_CONTENTS
 [45]     0xffffb1a82000->0xffffb1a83000 at 0x00085000: load34 ALLOC LOAD HAS_CONTENTS
 [46]     0xffffb1a83000->0xffffb1a9c000 at 0x00086000: load35 ALLOC READONLY CODE
 [47]     0xffffb1a9c000->0xffffb1a9d000 at 0x00086000: load36 ALLOC LOAD READONLY HAS_CONTENTS
 [48]     0xffffb1a9d000->0xffffb1a9e000 at 0x00087000: load37 ALLOC LOAD HAS_CONTENTS
 [49]     0xffffb1a9e000->0xffffb1ab0000 at 0x00088000: load38 ALLOC READONLY CODE
 [50]     0xffffb1ab0000->0xffffb1ab1000 at 0x00088000: load39 ALLOC LOAD READONLY HAS_CONTENTS
 [51]     0xffffb1ab1000->0xffffb1ab2000 at 0x00089000: load40 ALLOC LOAD HAS_CONTENTS
 [52]     0xffffb1ab2000->0xffffb1ac2000 at 0x0008a000: load41 ALLOC READONLY CODE
 [53]     0xffffb1ac2000->0xffffb1ac3000 at 0x0008a000: load42 ALLOC LOAD READONLY HAS_CONTENTS
 [54]     0xffffb1ac3000->0xffffb1ac4000 at 0x0008b000: load43 ALLOC LOAD HAS_CONTENTS
 [55]     0xffffb1ac4000->0xffffb1ae5000 at 0x0008c000: load44 ALLOC READONLY CODE
 [56]     0xffffb1ae5000->0xffffb1ae6000 at 0x0008c000: load45 ALLOC LOAD READONLY HAS_CONTENTS
 [57]     0xffffb1ae6000->0xffffb1ae7000 at 0x0008d000: load46 ALLOC LOAD HAS_CONTENTS
 [58]     0xffffb1ae7000->0xffffb1b06000 at 0x0008e000: load47 ALLOC READONLY CODE
 [59]     0xffffb1b06000->0xffffb1b07000 at 0x0008e000: load48 ALLOC LOAD READONLY HAS_CONTENTS
 [60]     0xffffb1b07000->0xffffb1b08000 at 0x0008f000: load49 ALLOC LOAD HAS_CONTENTS
 [61]     0xffffb1b08000->0xffffb1b85000 at 0x00090000: load50 ALLOC READONLY CODE
 [62]     0xffffb1b92000->0xffffb1b93000 at 0x00090000: load51 ALLOC LOAD READONLY HAS_CONTENTS
 [63]     0xffffb1b93000->0xffffb1b94000 at 0x00091000: load52 ALLOC LOAD READONLY CODE HAS_CONTENTS
 [64]     0xffffb1b94000->0xffffb1b97000 at 0x00092000: load53 ALLOC LOAD HAS_CONTENTS
 [65]     0xffffb1b97000->0xffffb1b9a000 at 0x00095000: load54 ALLOC LOAD HAS_CONTENTS
 [66]     0xffffe5cea000->0xffffe5d0b000 at 0x00098000: load55 ALLOC LOAD HAS_CONTENTS

Anyways, the pointer seems to point to a location in memory which is not allocated for the userspace. Omitting the first two bytes points to the data we are looking for (dom14 being the primary domain name).

(gdb) x/1sb 0xffffb1b07540
0xffffb1b07540: "dom14"

Given this is the problem, i don't understand why these two bytes are flipped, as musl strdup implementation unly returns the return value of malloc, which is always the destination pointer supplied.

neocturne commented 3 years ago

Ah, I got confused because info proc map didn't show the heap, and I overlooked the differing bytes. Very puzzling...

neocturne commented 3 years ago

I have opened #2174 to fix the issue.