ahgamut / superconfigure

wrap autotools configure scripts to build with Cosmopolitan Libc
The Unlicense
187 stars 26 forks source link

Bug: `zstd` on MacOS x86_64: `timefn::clock_gettime(CLOCK_MONOTONIC): Invalid argument` #51

Closed mattyclarkson closed 1 month ago

mattyclarkson commented 1 month ago

Caught as part of Bazel module testing that uses the APE binaries:

bazel-out/darwin_x86_64-opt-exec-ST-d57f47055a04/bin/external/ape~/ape/assimilate/zstd.ape/zstd --train --dictID 2170268541 -f fixture/dictionary/fixture-world.txt fixture/dictionary/goodbye-test.txt fixture/dictionary/goodbye-world.txt fixture/dictionary/hello-test.txt fixture/dictionary/hello-world.txt fixture/dictionary/main-world.txt fixture/dictionary/test-world.txt -o bazel-out/darwin_x86_64-fastbuild/bin/zstd/dictionary/dictionary)
# Configuration: 7b9ed97535e62681049102412f00c875c9900c486a0fdebc94c7adf148aded85
# Execution platform: @@platforms//host:host
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
!  Warning : data size of samples too small for target dictionary size
!  Samples should be about 100x larger than target dictionary size
timefn::clock_gettime(CLOCK_MONOTONIC): Invalid argument
error: Uncaught SIGABRT (SI_0) at 0x1f50000e392 on b9f0738b-f266-4163-a2a9-a1f13a4aa0e7.bazel.cb.macservice.goog pid 58258 tid 259
  bazel-out/darwin_x86_64-opt-exec-ST-d57f47055a04/bin/external/ape~/ape/assimilate/zstd.ape/zstd
  Invalid argument
  Darwin Cosmopolitan 3.3.0 MODE=x86_64; Darwin Kernel Version 23.5.0: Wed May  1 20:09:52 PDT 2024; root:xnu-10063.121.3~5/RELEASE_X86_64 b9f0738b-f266-4163-a2a9-a1f13a4aa0e7.bazel.cb.macservice.goog 23.5.0
RAX 0000000000000000 RBX 0000000000000006 RDI 0000000000000103
RCX 00007ff7bfeff748 RDX 0000000000000000 RSI 0000000000000006
RBP 00007ff7bfeff760 RSP 00007ff7bfeff748 RIP 000000000055cd79
 R8 0000000000564f43  R9 0000000000000000 R10 000000000055cd79
R11 0000000000000292 R12 0000100080050230 R13 0000000000000010
R14 0000000000000002 R15 0000000000000000
TLS 0000000000582a00
XMM0  00000000000000000000000000000000 XMM8  00000000000000000000000000000000
XMM1  00000000000000000000000000000014 XMM9  00000000000000000000000000000000
XMM2  000000001f46e6aa0000000066db972d XMM10 00000000000000000000000000000000
XMM3  00000000000000000000000066ba1fff XMM11 00000000000000000000000000000000
XMM4  3fe80000000000000000000000000000 XMM12 00000000000000000000000000000000
XMM5  00000000000000000000000000001000 XMM13 00000000000000000000000000000000
XMM6  00000000000000000000000066ba1fff XMM14 00000000000000000000000000000000
XMM7  00000000000000000000000066ba1fff XMM15 00000000000000000000000000000000
cosmoaddr2line /private/var/tmp/_bazel_buildkite/e857cca7c5d6bfd3d34f946b92c98654/sandbox/darwin-sandbox/4/execroot/_main/bazel-out/darwin_x86_64-opt-exec-ST-d57f47055a04/bin/external/ape~/ape/assimilate/zstd.ape/zstd 55cd79 40a8aa 409de7 4cf3ec 4cf73b 406634 408463 4045fb
note: won't print addr2line backtrace because probably llvm
7ff7bfefc920 55cd79 systemfive_bsd+21
7ff7bfeff760 40a8aa abort+45
7ff7bfeff780 409de7 UTIL_clockSpanNano.cold+0
7ff7bfeff7b0 4cf3ec DiB_loadFiles+700
7ff7bfeff830 4cf73b DiB_trainFromFiles+507
7ff7bfeff8e0 406634 main+7284
7ff7bfeffba0 408463 cosmo+73
7ff7bfeffbb0 4045fb _start+133
10008004-10008009 rw-pa- 6x automap 384kB w/ 4288kB hole
1000804d-1000804f r--s-- 3x automap 134kB w/ 96tB hole
6fe00004-6fe00004 rw-paF 1x g_fds 64kB
# 640kB total mapped memory

Should this be a upstream bug on jart/cosmopolitan? Seems like a libc issue?

ahgamut commented 1 month ago

clock_gettime was improved recently in the libc, and seems like superconfigure releases haven't caught up yet.

ahgamut commented 1 month ago

Not sure about macOS x86_64 specifically, perhaps @jart can explain how debugging there can work.

ahgamut commented 1 month ago

@mattyclarkson can you try with the zstd from the latest release https://github.com/ahgamut/superconfigure/releases/tag/z0.0.55

jart commented 1 month ago

I can't reproduce this. Here's the code that zstd uses:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define PTime int64_t

typedef struct {
  PTime t;
} UTIL_time_t;

UTIL_time_t UTIL_getTime(void) {
  /* time must be initialized, othersize it may fail msan test.
   * No good reason, likely a limitation of timespec_get() for some target */
  struct timespec time = {0, 0};
  if (clock_gettime(CLOCK_MONOTONIC, &time) != 0) {
    perror("timefn::clock_gettime(CLOCK_MONOTONIC)");
    abort();
  }
  {
    UTIL_time_t r;
    r.t = (PTime)time.tv_sec * 1000000000ULL + (PTime)time.tv_nsec;
    return r;
  }
}

int main(int argc, char *argv[]) {
  printf("%ld\n", UTIL_getTime().t);
}

This works fine on MacOS x86-64 for me.

$?=1 jart@xnu:~$ uname -a
Darwin xnu.lan 23.1.0 Darwin Kernel Version 23.1.0: Mon Oct  9 21:27:27 PDT 2023; root:xnu-10002.41.9~6/RELEASE_X86_64 x86_64
jart@xnu:~$ ./wut
1719610618934733983

The implementation of clock_gettime for XNU x86-64 is as follows:

int sys_clock_gettime_xnu(int clock, struct timespec *ts) {
  long ax, dx;
  if (clock == CLOCK_REALTIME) {
    // invoke the system call
    //
    //   int gettimeofday(struct timeval *tp,
    //                    struct timezone *tzp,
    //                    uint64_t *mach_absolute_time);
    //
    // as follows
    //
    //   ax, dx = gettimeofday(&ts, 0, 0);
    //
    // to support multiple calling conventions
    //
    //   1. new xnu returns *ts in memory via rdi
    //   2. old xnu returns *ts in rax:rdx regs
    //
    // we assume this system call always succeeds
    asm volatile("syscall"
                 : "=a"(ax), "=d"(dx)
                 : "0"(0x2000000 | 116), "D"(ts), "S"(0), "1"(0)
                 : "rcx", "r8", "r9", "r10", "r11", "memory");
    if (ax) {
      ts->tv_sec = ax;
      ts->tv_nsec = dx;
    }
    ts->tv_nsec *= 1000;
    return 0;
  } else if (clock == CLOCK_BOOTTIME ||   //
             clock == CLOCK_MONOTONIC ||  //
             clock == CLOCK_MONOTONIC_COARSE) {
    return sys_clock_gettime_mono(ts);
  } else {
    return -EINVAL;
  }
}

So unless the memory for clock or CLOCK_MONOTONIC is being corrupted somehow, I fail to see how it could EINVAL.

jart commented 1 month ago

The only way I can imagine this happening would be if you've edited the zstd source code to redefine CLOCK_MONOTONIC to be CLOCK_MONOTONIC_RAW or something like that. The normal monotonic clock is guaranteed to work. Certifiably across fleet. The raw monotonic clock isn't supported by some systems like yours.

mattyclarkson commented 1 month ago

if you've edited the zstd source code to redefine

We use the prebuilt binaries from https://cosmo.zip/pub/cosmos/v/ in https://gitlab.arm.com/bazel/ape.

What is the correlation between superconfigure releases and the versioned cosmo.zip releases?

This works fine on MacOS x86-64 for me.

This may be something in the Bazel Central Registry MacOS x86_64 runner.

@mattyclarkson can you try with the zstd from the latest release https://github.com/ahgamut/superconfigure/releases/tag/z0.0.55

Yep, I can roll out a new Bazel toolchain using that binary.

This is not critical, we can workaround by using the built toolchain using the BCR zstd package and building from source.


Leave it all with me, I'll do some debugging. Thanks for your prompt, and detailed, responses. 🙇

ahgamut commented 1 month ago

What is the correlation between superconfigure releases and the versioned cosmo.zip releases?

cosmo.zip has nightly binaries built from the latest commit of Cosmopolitan Libc. They are more for our internal testing.

superconfigure releases are more "stable" because I use Github Actions and a specific commit of Cosmpolitan, but the primary purpose is just to show how different third-party codebases can be built.

mattyclarkson commented 1 month ago

What is the correlation between superconfigure releases and the versioned cosmo.zip releases?

cosmo.zip has nightly binaries built from the latest commit of Cosmopolitan Libc. They are more for our internal testing.

I was wondering about the versioned cosmo.zip under https://cosmo.zip/pub/cosmos/v/

mattyclarkson commented 1 month ago

I added v0.0.55 to the rules_zstd Bazel module.

It passed all tests on all BCR platforms and was added to the BCR in bazelbuild/bazel-central-registry#2792

Seems between cosmo.zip@3.7.1 and superconfigure@v0.0.55 solved this, thanks.