chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 420 forks source link

A possibility to avoid leaking of the build paths (i.e. CHPL_HOME) into the generated binaries #22333

Open twesterhout opened 1 year ago

twesterhout commented 1 year ago

The Chapel compiler currently saves a lot of information about the build environment in the executables. This information includes CHPL_HOME as well as other environment variables. I believe there are two reasons why having a possibility to "hide" this information is desirable:

  1. Security considerations. I'll let security experts chime in here, but a simple example is that I might not want the users of my Chapel code to see what my username on my private computer is. This information will be visible if it appears in CHPL_HOME, for instance.
  2. Nix package manager automatically determines runtime dependencies by grep -ring through all generated files (conceptually this is the case, technically though it doesn't really invoke grep). That means that if we're using, say, the hello6-taskpar-dist.chpl example and have set CHPL_HOME as well as CHPL_TARGET_CXX, then Nix will think that both Chapel and LLVM/Clang are runtime dependencies of the produced executable. This is, of course, not necessary the case, and that leads to Nix closures of Chapel apps being much bigger than they have to be. For instance, if I'm generating a Singularity container for the Chapel app, I don't really want LLVM there as it'd increase the size of the container considerably.

This issue is to discuss whether the core Chapel developers would agree that the above two arguments are convincing enough to add support for a --no-about or a similar flag that would partially hide the build system configuration such that it doesn't appear in the executable.

Tagging @bradcray as we've started discussing this on Gitter.

twesterhout commented 1 year ago

I've started experimenting with what it'd take to hide the paths, and my first try was:

substituteInPlace compiler/codegen/codegen.cpp \
  --replace 'genGlobalString("chpl_compileCommand", compileCommand);' \
            'genGlobalString("chpl_compileCommand", "<unknown>");' \
  --replace 'genGlobalString("chpl_compileDirectory", getCwd());' \
            'genGlobalString("chpl_compileDirectory", "<unknown>");' \
  --replace 'genGlobalString("CHPL_HOME", CHPL_HOME);' \
            'genGlobalString("CHPL_HOME", "<unknown>");' \
  --replace 'genGlobalString(env->first.c_str(), env->second);' \
            'genGlobalString(env->first.c_str(), "<unknown>");' \
  --replace 'codegenCallPrintf(astr("Compilation command: ", compileCommand, "\\n"));' \
            'codegenCallPrintf(astr("Compilation command: ", "<unknown>", "\\n"));' \
  --replace 'codegenCallPrintf(astr("  CHPL_HOME: ", CHPL_HOME, "\\n"));' \
            'codegenCallPrintf(astr("  CHPL_HOME: ", "<unknown>", "\\n"));' \
  --replace 'codegenCallPrintf(astr("  ", env->first.c_str(), ": ", env->second, "\\n"));' \
            'codegenCallPrintf(astr("  ", env->first.c_str(), ": ", "<unknown>", "\\n"));'

(the substituteInPlace does exactly what the name suggests).

This gets rid of the direct Clang/LLVM dependency, but CHPL_HOME still appears in the executable.

twesterhout commented 1 year ago

Hm... I'm still getting these:

$ strings ./result/bin/hello6-taskpar-dist | grep /nix
/nix/store/8bmp6r3a0xfha3wj36phlc47clh9w81l-glibc-2.35-224/lib/ld-linux-x86-64.so.2
/nix/store/c6a2k5zcyag4qvbwm69rbqjhr739fj2k-libunwind-1.6.2/lib:/nix/store/68y9c6pmlncm4zh62avmnaa1cd7k25fa-gmp-with-cxx-6.2.1/lib:/nix/store/inzbkps4dv01p0c5rzdiqp1lzci73kj6-xz-5.4.1/lib:/nix/store/8bmp6r3a0xfha3wj36phlc47clh9w81l-glibc-2.35-224/lib
/nix/store/wjd7l8bislnfz5vdm5gjlccsz9a0kv8f-chapel-1.30.0/runtime/include/tasks/qthreads/chpl-tasks-impl-fns.h
/nix/store/wjd7l8bislnfz5vdm5gjlccsz9a0kv8f-chapel-1.30.0/runtime/include/chpl-mem.h
/nix/store/wjd7l8bislnfz5vdm5gjlccsz9a0kv8f-chapel-1.30.0/runtime/include/qio/qio.h

It all looks good except for the last three strings... I cannot figure out, how they're entering the code. My initial thought was that there's an assert somewhere that uses on __FILE__, but I don't see where...

On second thought, if I compile with --print-commands --devel, I see that <internal clang cc> is invoked without -DNDEBUG, so asserts could indeed be the reason. Is there a way to customize arguments to <internal clang cc>?

twesterhout commented 1 year ago

My solution so far consists of two parts:

  1. I create a copy of $CHPL_HOME/third-party in an independent directory. Nix treats this directory as a separate package. The idea here is to let executables access various GASNet launchers without depending on the Chapel compiler.
  2. Using sed -i I patch the generated binary to replace some (not all!) strings of the form /nix/store/<hash>-<package>-<version> with /nix/store/XXXXXXXX...XXXX-<package>-<version>.

If the above two steps are done very carefully, it allows me to achieve the following:

CHPL_HOME=/nix/store/9gqp4k5cmjw750rfhgjkqnz45kn7yxr2-chapel-1.31.0/bin/chpl, but ./result/bin/hello6-taskpar-dist -a returns:

Compilation command: chpl -I /nix/store/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-glibc-2.35-224-dev/include -I /nix/store/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-clang-14.0.6-lib/lib/clang/14.0.6/include -L /nix/store/68y9c6pmlncm4zh62avmnaa1cd7k25fa-gmp-with-cxx-6.2.1/lib -L /nix/store/inzbkps4dv01p0c5rzdiqp1lzci73kj6-xz-5.4.1/lib --print-commands --devel hello6-taskpar-dist.chpl 
Chapel compiler version: 1.31.0 pre-release (xxxxxxxxxx)
Chapel environment:
  CHPL_HOME: /nix/store/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-chapel-1.31.0
  CHPL_ATOMICS: cstdlib
  CHPL_AUX_FILESYS: none
  CHPL_COMM: gasnet
  CHPL_COMM_SUBSTRATE: mpi
  CHPL_COMPILER_SUBDIR: linux64/llvm/x86_64/hostmem-jemalloc/llvm-system/14/san-none
  CHPL_CUDA_LIBDEVICE_PATH: 
  CHPL_CUDA_PATH: 
  CHPL_GASNET_SEGMENT: everything
  CHPL_GASNET_UNIQ_CFG_PATH: linux64-x86_64-none-llvm-none/substrate-mpi/seg-everything
  CHPL_GMP: system
  CHPL_GMP_IS_OVERRIDDEN: True
  CHPL_GMP_UNIQ_CFG_PATH: linux64-x86_64-none-llvm-none
  CHPL_GPU_ARCH: 
  CHPL_GPU_CODEGEN: none
  CHPL_GPU_MEM_STRATEGY: unified_memory
  CHPL_GPU_RUNTIME: none
  CHPL_HOST_ARCH: x86_64
  CHPL_HOST_BIN_SUBDIR: linux64-x86_64
  CHPL_HOST_BUNDLED_COMPILE_ARGS: 
  CHPL_HOST_BUNDLED_LINK_ARGS: 
  CHPL_HOST_CC: /nix/store/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-clang-wrapper-14.0.6/bin/clang
  CHPL_HOST_COMPILER: llvm
  CHPL_HOST_CPU: none
  CHPL_HOST_CXX: /nix/store/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-clang-wrapper-14.0.6/bin/clang++
  CHPL_HOST_JEMALLOC: bundled
  CHPL_HOST_JEMALLOC_UNIQ_CFG_PATH: host/linux64-x86_64-llvm
  CHPL_HOST_MEM: jemalloc
  CHPL_HOST_PLATFORM: linux64
  CHPL_HOST_SYSTEM_COMPILE_ARGS: 
  CHPL_HOST_SYSTEM_LINK_ARGS: 
  CHPL_HWLOC: bundled
  CHPL_HWLOC_UNIQ_CFG_PATH: linux64-x86_64-none-llvm-none-flat
  CHPL_LAUNCHER: gasnetrun_mpi
  CHPL_LAUNCHER_SUBDIR: linux64/llvm/x86_64/loc-flat/comm-gasnet/mpi/everything/tasks-qthreads/launch-gasnetrun_mpi/tmr-generic/unwind-system/mem-jemalloc/atomics-cstdlib/lib_pic-none/san-none
  CHPL_LIBFABRIC: none
  CHPL_LIBFABRIC_UNIQ_CFG_PATH: linux64-x86_64-none-llvm-none
  CHPL_LIBUNWIND_UNIQ_CFG_PATH: linux64-x86_64-none-llvm-none
  CHPL_LIB_PIC: none
  CHPL_LLVM: system
  CHPL_LLVM_CLANG_C: 
  CHPL_LLVM_CLANG_CXX: 
  CHPL_LLVM_CONFIG: /nix/store/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-llvm-14.0.6-dev/bin/llvm-config
  CHPL_LLVM_STATIC_DYNAMIC: dynamic
  CHPL_LLVM_SUPPORT: system
  CHPL_LLVM_TARGET_CPU: none
...

As you see, XXX...X is inserted instead of some package hashes, but only for packages that we don't need at runtime, such as Clang or the headers of glibc.

And if we type

$ nix-store --query --references ./result
/nix/store/8bmp6r3a0xfha3wj36phlc47clh9w81l-glibc-2.35-224
/nix/store/4pijpgzbhhwdkj742577jg06bb75zn0g-openmpi-4.1.4
/nix/store/wvm2hvqdbbsp1f11463mrw8nyv678ipm-gcc-12.2.0-lib
/nix/store/68y9c6pmlncm4zh62avmnaa1cd7k25fa-gmp-with-cxx-6.2.1
/nix/store/inzbkps4dv01p0c5rzdiqp1lzci73kj6-xz-5.4.1
/nix/store/c6a2k5zcyag4qvbwm69rbqjhr739fj2k-libunwind-1.6.2
/nix/store/n6jdj6b2s24pckl9jpszzkkibfcbnk32-chapel-1.31.0-third_party

i.e. Nix confirms that our package does not depend on the Chapel compiler anymore. However, there is now "chapel-1.31.0-third_party" that is that special package that contains the GASNet launchers. For instance,

$ ./result/bin/hello6-taskpar-dist -nl1 -v
/nix/store/n6jdj6b2s24pckl9jpszzkkibfcbnk32-chapel-1.31.0-third_party/gasnet/install/linux64-x86_64-none-llvm-none/substrate-mpi/seg-everything/bin/gasnetrun_mpi -n 1 -N 1 -c 0 -E SHELL,SESSION_MANAGER,QT_ACCESSIBILITY,COLORTERM,__HM_SESS_VARS_SOURCED,SSH_AGENT_LAUNCHER,XDG_MENU_PREFIX,GNOME_DESKTOP_SESSION_ID,GNOME_KEYRING_CONTROL,__EGL_VENDOR_LIBRARY_FILENAMES,LC_ADDRESS,LC_NAME,SSH_AUTH_SOCK,XDG_DATA_HOME,XDG_CONFIG_HOME,XCURSOR_PATH,LOCALE_ARCHIVE_2_27,XMODIFIERS,DESKTOP_SESSION,LC_MONETARY,KITTY_PID,EDITOR,GTK_MODULES,PWD,NIX_PROFILES,XDG_SESSION_DESKTOP,LOGNAME,XDG_SESSION_TYPE,NIX_PATH,SYSTEMD_EXEC_PID,XAUTHORITY,KITTY_PUBLIC_KEY,GDM_LANG,HOME,USERNAME,IM_CONFIG_PHASE,LC_PAPER,LANG,TMUX_TMPDIR,XDG_CURRENT_DESKTOP,WAYLAND_DISPLAY,NIX_SSL_CERT_FILE,KITTY_WINDOW_ID,INVOCATION_ID,QTWEBENGINE_DICTIONARIES_PATH,MANAGERPID,XDG_CACHE_HOME,GNOME_SETUP_DISPLAY,XDG_SESSION_CLASS,TERMINFO,TERM,LC_IDENTIFICATION,USER,DISPLAY,SHLVL,LC_TELEPHONE,QT_IM_MODULE,LC_MEASUREMENT,LIBGL_DRIVERS_PATH,TERMINFO_DIRS,XDG_STATE_HOME,LD_LIBRARY_PATH,XDG_RUNTIME_DIR,LIBVA_DRIVERS_PATH,LC_TIME,JOURNAL_STREAM,XDG_DATA_DIRS,PATH,GDMSESSION,DBUS_SESSION_BUS_ADDRESS,KITTY_INSTALLATION_DIR,GIO_LAUNCHED_DESKTOP_FILE_PID,LC_NUMERIC,OLDPWD,_ /nix/store/3asg1h28b5in4rnfisi858w83n8a5xvh-hello-chapel/bin/hello6-taskpar-dist_real -nl1 -v
oversubscribed = False
QTHREADS: Using 4 Shepherds
QTHREADS: Using 1 Workers per Shepherd
QTHREADS: Guard Pages Enabled
QTHREADS: Using 8376320 byte stack size.
executing on node 0 of 1 node(s): wh2
Hello, world! (from locale 0 of 1 named wh2)

which confirms that the gasnetrun_mpi executable comes from the "chapel-1.31.0-third_party" package.