cisco / ChezScheme

Chez Scheme
Apache License 2.0
6.91k stars 982 forks source link

Support portable boot files on more platforms #732

Closed ur4t closed 8 months ago

ur4t commented 9 months ago

Support portable boot files (%x) on more platforms

mflatt commented 9 months ago

I like the idea of more support for executable-relative boot files. There are a many pitfalls to getting the current executable's path on Unix variants, though. For example, a fixed BOOT_MAX_PATH array size worries me.

Here's the code that Racket uses:

https://github.com/racket/racket/blob/master/racket/src/start/self_exe.inc

I wonder whether it would make sense to put (a cleaner version of) this code in the Chez Scheme tree for use both by Chez Scheme and clients like Racket. This code has has been tested and refined for a while, at least. Or maybe there's another existing implementation that would be better to start with?

ur4t commented 9 months ago

I wonder whether it would make sense to put (a cleaner version of) this code in the Chez Scheme tree for use both by Chez Scheme and clients like Racket. This code has has been tested and refined for a while, at least. Or maybe there's another existing implementation that would be better to start with?

It would be excellent to reuse battle-tested Racket version. And there are some problems we need to solve:

  1. The "generic" method requires argv[0], how should we pass it to next_path?
  2. How we should expose this functionality to clients?
samth commented 9 months ago

This Stack Overflow answer mentions a few OS interfaces that Racket isn't using: https://stackoverflow.com/questions/1023306/finding-current-executables-path-without-proc-self-exe (eg for Solaris) but mostly aligns with the current Racket code.

https://github.com/gpakosz/whereami seems to be the only other project that I've found that attempts to be as comprehensive. It's hard for me to tell the differences at a glance.

mflatt commented 9 months ago

Maybe Chez Scheme should provide a Sregister_exe_args or Sregister_argv0 function alongside functions like Sregister_boot_file, and programs can call that to communicate arguments before using other functions? The default could be "/usr/local/bin/scheme", or something like that, to avoid having to introduce new "not known" handling if none is register.

ur4t commented 9 months ago

Maybe Chez Scheme should provide a Sregister_exe_args or Sregister_argv0 function alongside functions like Sregister_boot_file, and programs can call that to communicate arguments before using other functions? The default could be "/usr/local/bin/scheme", or something like that, to avoid having to introduce new "not known" handling if none is register.

I found that ./configure --start=custom-bootfile-name will set alwaysUseBootFile=custom-bootfile-name in Mf-config and Sbuild_heap("custom-bootfile-name", ...) will set the name as well.

Current executable file path detection method in 6f87bf7 is based on Racket version and LLVM version, with proper simplification for better readability.

mflatt commented 8 months ago

@ur4t What do you think of the approach at https://github.com/mflatt/ChezScheme/tree/selfexe, where "self_exe.c" is kept as a separate file? That way, Racket variants can use the same implementation, and the amount of code to maintain stays the same from my perspective (although it's more code for Chez Scheme).

I saw that you had added getexecname guarded by defined(__sun__) && defined(__svr4__), but my impression is that it just returns argv[0], so it doesn't seem like the right thing to use. I might be confused, though.

ur4t commented 8 months ago

What do you think of the approach at https://github.com/mflatt/ChezScheme/tree/selfexe, where "self_exe.c" is kept as a separate file? That way, Racket variants can use the same implementation, and the amount of code to maintain stays the same from my perspective (although it's more code for Chez Scheme).

A separate file is much clearer and actually reduces the maintenance burden, but maybe we should name it "self-exe.c" to keep consistent with other c files.

I skimmed the patch, and found that there's no SELF_EXE_WINDOWS_AS_UTF8 defined, which means self_exe_t is defined as wchar_t and char *tstart = get_process_executable_path(execpath) is buggy on Windows.

I saw that you had added getexecname guarded by defined(__sun__) && defined(__svr4__), but my impression is that it just returns argv[0], so it doesn't seem like the right thing to use. I might be confused, though.

Manpage of getexecname says "Normally this is an absolute pathname, as the majority of commands are executed by the shells that append the command name to the user's PATH components. If this is not an absolute path, the output of getcwd(3C) can be prepended to it to create an absolute path".

I chose to use realpath to resolve all symlinks and dots before return on platforms other than Windows, so it should work as expected if the actual behaviour is the same. Or maybe we should add a safe guard to check whether it is an absolute pathname. However I have no access to a real Solaris machine thus cannot test the validity of this implementation.


Personally I think the simplified version is good for maintenance, but it is better to seperate the simplification into another PR.

Modifications in this PR follows c99 standard, and I am considering utilizing features proposed by new standards such as c11 and c17 to reduce cross-platform efforts. Is there a compiler support guideline?

mflatt commented 8 months ago

I agree about "self-exe.c". Also, probably the function name get_process_executable_path should have a leading S to be consistent with other non-static names.

Good point about SELF_EXE_WINDOWS_AS_UTF8. It's probably better to have SELF_EXE_WINDOWS_AS_UTF16 to opt into a difference instead of making different interfaces the default.

I think that man page of getexecname is over-optimistic about the value of argv[0]. After all, "self-exe.c" wouldn't need to exist if it were straightforward to get the executable path from argv[0]. I doubt that Solaris is special, and it will just let argv[0] through as provided to exec, so the same issues will happen there.

Using realpath in lookup_exe_via_path is a good improvement and gets rid of the need for get_current_directory.

I don't know whether the Chez Scheme implementors have already declared a particular C variant to use, but I think we should stick to C99.

ur4t commented 8 months ago

e45abfa adds a proxy function S_get_process_executable_path to be consistent with other non-static functions in externs.h and uses char * on Windows unless SELF_EXE_WINDOWS_AS_UTF16 is defined.

Currently no other improvement is implemented, to minimize difference between self-exe.c and self_exe.inc in Racket, which might be good for synchronizing modifications in the future.

I do not know where to statement this PR in the release note, maybe a more precise workflow description is necessary.

Is this PR ready to be merged?

mflatt commented 8 months ago

Thanks for this improvement!

How about making S_get_process_executable_path the only non-static definition in the file? That would be not only to be consistent with other parts of Chez Scheme, but also avoid potential conflicts when linking into a larger host program. (Nothing guarantees that S is a distinguishing prefix, of course, but it at least limits the space of potential conflicts.) I originally had in mind making Sget_process_executable_path a documented public function so that a host program could use it, but I'm not sure it's worthwhile.

No need to hold back on improvements to self-exe.c, unless you think those improvements are better handled later. The Racket version of this file (or most of it) can just go away with the version here as the single maintained source.

For the release notes, the note would be a new subsection at the top of "Functionality Changes".

ur4t commented 8 months ago

How about making S_get_process_executable_path the only non-static definition in the file?

Now S_get_process_executable_path is the only non-static definition in the file.

I originally had in mind making Sget_process_executable_path a documented public function so that a host program could use it, but I'm not sure it's worthwhile.

Currently self-exe.c is designed to be "independent" (a duplicated wide_to_utf8 is provided), which means it can be included directly.

Feature macros SELF_EXE_WINDOWS_AS_UTF(8|16) make it difficult to distinguish type of the return value, thus now get_process_executable_path always return char * and self_exe_t is removed.

On Racket side, CS part can remove SELF_EXE_WINDOWS_AS_UTF8 and self_exe_t easily, but BC part will need more efforts (or a tiny wrapper might be enough?).

mflatt commented 8 months ago

Some last things I notice:

ur4t commented 8 months ago
  • On FreeBSD, "self-exe.c" doesn't build because errno isn't declared.
  • On Solaris, "self-exe.c" compiles with a warning, because getexecname returns a const pointer.
  • get_self_path_generic can return NULL, but get_process_executable_path doesn't account for that.
  • The top of the file still says that get_process_executable_path (without the S_) is defined.
  • S_get_process_executable_path can return NULL, where originally I think it would always return an allocated string — unless something like strdup returned NULL, which was treated as just catastrophic. Maybe just document this at the top?

Those issues are fixed in 012b2957f5ada0eea0ed43d4cb5f9e88060f53fd.

  • Symbolic links are resolve more than before via realpath. That worries me, but I can't think of a way that it will actually create trouble.

There is a rare circumstance under which resolving all symbolic links will create trouble: users expect to obtain a symlink to the path that was used to start the program, and not the eventual binary file. I think it is a good idea to keep a consistent behaviour with LLVM, which means to resolve all symbolic links via realpath.

mflatt commented 8 months ago

This now looks good to me. I've tried it out via #755 and also confirmed (as much as I can) that Racket can take advantage of "self-exe.c" for its own search.