Closed ur4t closed 8 months ago
I like the idea of more support for executable-relative boot files. There are a many pitfalls to getting the current executable's path on Unix variants, though. For example, a fixed BOOT_MAX_PATH
array size worries me.
Here's the code that Racket uses:
https://github.com/racket/racket/blob/master/racket/src/start/self_exe.inc
I wonder whether it would make sense to put (a cleaner version of) this code in the Chez Scheme tree for use both by Chez Scheme and clients like Racket. This code has has been tested and refined for a while, at least. Or maybe there's another existing implementation that would be better to start with?
I wonder whether it would make sense to put (a cleaner version of) this code in the Chez Scheme tree for use both by Chez Scheme and clients like Racket. This code has has been tested and refined for a while, at least. Or maybe there's another existing implementation that would be better to start with?
It would be excellent to reuse battle-tested Racket version. And there are some problems we need to solve:
argv[0]
, how should we pass it to next_path
?This Stack Overflow answer mentions a few OS interfaces that Racket isn't using: https://stackoverflow.com/questions/1023306/finding-current-executables-path-without-proc-self-exe (eg for Solaris) but mostly aligns with the current Racket code.
https://github.com/gpakosz/whereami seems to be the only other project that I've found that attempts to be as comprehensive. It's hard for me to tell the differences at a glance.
Maybe Chez Scheme should provide a Sregister_exe_args
or Sregister_argv0
function alongside functions like Sregister_boot_file
, and programs can call that to communicate arguments before using other functions? The default could be "/usr/local/bin/scheme", or something like that, to avoid having to introduce new "not known" handling if none is register.
Maybe Chez Scheme should provide a
Sregister_exe_args
orSregister_argv0
function alongside functions likeSregister_boot_file
, and programs can call that to communicate arguments before using other functions? The default could be "/usr/local/bin/scheme", or something like that, to avoid having to introduce new "not known" handling if none is register.
I found that ./configure --start=custom-bootfile-name
will set alwaysUseBootFile=custom-bootfile-name
in Mf-config
and Sbuild_heap("custom-bootfile-name", ...)
will set the name as well.
Current executable file path detection method in 6f87bf7 is based on Racket version and LLVM version, with proper simplification for better readability.
@ur4t What do you think of the approach at https://github.com/mflatt/ChezScheme/tree/selfexe, where "self_exe.c" is kept as a separate file? That way, Racket variants can use the same implementation, and the amount of code to maintain stays the same from my perspective (although it's more code for Chez Scheme).
I saw that you had added getexecname
guarded by defined(__sun__) && defined(__svr4__)
, but my impression is that it just returns argv[0]
, so it doesn't seem like the right thing to use. I might be confused, though.
What do you think of the approach at https://github.com/mflatt/ChezScheme/tree/selfexe, where "self_exe.c" is kept as a separate file? That way, Racket variants can use the same implementation, and the amount of code to maintain stays the same from my perspective (although it's more code for Chez Scheme).
A separate file is much clearer and actually reduces the maintenance burden, but maybe we should name it "self-exe.c" to keep consistent with other c files.
I skimmed the patch, and found that there's no SELF_EXE_WINDOWS_AS_UTF8
defined, which means self_exe_t
is defined as wchar_t
and char *tstart = get_process_executable_path(execpath)
is buggy on Windows.
I saw that you had added
getexecname
guarded bydefined(__sun__) && defined(__svr4__)
, but my impression is that it just returnsargv[0]
, so it doesn't seem like the right thing to use. I might be confused, though.
Manpage of getexecname
says "Normally this is an absolute pathname, as the majority of commands are executed by the shells that append the command name to the user's PATH components. If this is not an absolute path, the output of getcwd(3C) can be prepended to it to create an absolute path".
I chose to use realpath
to resolve all symlinks and dots before return on platforms other than Windows, so it should work as expected if the actual behaviour is the same. Or maybe we should add a safe guard to check whether it is an absolute pathname. However I have no access to a real Solaris machine thus cannot test the validity of this implementation.
Personally I think the simplified version is good for maintenance, but it is better to seperate the simplification into another PR.
Modifications in this PR follows c99 standard, and I am considering utilizing features proposed by new standards such as c11 and c17 to reduce cross-platform efforts. Is there a compiler support guideline?
I agree about "self-exe.c". Also, probably the function name get_process_executable_path
should have a leading S
to be consistent with other non-static
names.
Good point about SELF_EXE_WINDOWS_AS_UTF8
. It's probably better to have SELF_EXE_WINDOWS_AS_UTF16
to opt into a difference instead of making different interfaces the default.
I think that man page of getexecname
is over-optimistic about the value of argv[0]
. After all, "self-exe.c" wouldn't need to exist if it were straightforward to get the executable path from argv[0]
. I doubt that Solaris is special, and it will just let argv[0]
through as provided to exec
, so the same issues will happen there.
Using realpath
in lookup_exe_via_path
is a good improvement and gets rid of the need for get_current_directory
.
I don't know whether the Chez Scheme implementors have already declared a particular C variant to use, but I think we should stick to C99.
e45abfa adds a proxy function S_get_process_executable_path
to be consistent with other non-static
functions in externs.h
and uses char *
on Windows unless SELF_EXE_WINDOWS_AS_UTF16
is defined.
Currently no other improvement is implemented, to minimize difference between self-exe.c
and self_exe.inc
in Racket, which might be good for synchronizing modifications in the future.
I do not know where to statement this PR in the release note, maybe a more precise workflow description is necessary.
Is this PR ready to be merged?
Thanks for this improvement!
How about making S_get_process_executable_path
the only non-static
definition in the file? That would be not only to be consistent with other parts of Chez Scheme, but also avoid potential conflicts when linking into a larger host program. (Nothing guarantees that S
is a distinguishing prefix, of course, but it at least limits the space of potential conflicts.) I originally had in mind making Sget_process_executable_path
a documented public function so that a host program could use it, but I'm not sure it's worthwhile.
No need to hold back on improvements to self-exe.c
, unless you think those improvements are better handled later. The Racket version of this file (or most of it) can just go away with the version here as the single maintained source.
For the release notes, the note would be a new subsection at the top of "Functionality Changes".
How about making
S_get_process_executable_path
the only non-static
definition in the file?
Now S_get_process_executable_path
is the only non-static
definition in the file.
I originally had in mind making
Sget_process_executable_path
a documented public function so that a host program could use it, but I'm not sure it's worthwhile.
Currently self-exe.c
is designed to be "independent" (a duplicated wide_to_utf8
is provided), which means it can be included directly.
Feature macros SELF_EXE_WINDOWS_AS_UTF(8|16)
make it difficult to distinguish type of the return value, thus now get_process_executable_path
always return char *
and self_exe_t
is removed.
On Racket side, CS part can remove SELF_EXE_WINDOWS_AS_UTF8
and self_exe_t
easily, but BC part will need more efforts (or a tiny wrapper might be enough?).
Some last things I notice:
The top of the file still says that get_process_executable_path
(without the S_
) is defined.
On FreeBSD, "self-exe.c" doesn't build because errno
isn't declared.
On Solaris, "self-exe.c" compiles with a warning, because getexecname
returns a const
pointer.
get_self_path_generic
can return NULL, but get_process_executable_path
doesn't account for that.
S_get_process_executable_path
can return NULL, where originally I think it would always return an allocated string — unless something like strdup
returned NULL, which was treated as just catastrophic. Maybe just document this at the top?
Symbolic links are resolve more than before via realpath
. That worries me, but I can't think of a way that it will actually create trouble.
- On FreeBSD, "self-exe.c" doesn't build because
errno
isn't declared.- On Solaris, "self-exe.c" compiles with a warning, because
getexecname
returns aconst
pointer.get_self_path_generic
can return NULL, butget_process_executable_path
doesn't account for that.- The top of the file still says that
get_process_executable_path
(without theS_
) is defined.S_get_process_executable_path
can return NULL, where originally I think it would always return an allocated string — unless something likestrdup
returned NULL, which was treated as just catastrophic. Maybe just document this at the top?
Those issues are fixed in 012b2957f5ada0eea0ed43d4cb5f9e88060f53fd.
- Symbolic links are resolve more than before via
realpath
. That worries me, but I can't think of a way that it will actually create trouble.
There is a rare circumstance under which resolving all symbolic links will create trouble: users expect to obtain a symlink to the path that was used to start the program, and not the eventual binary file. I think it is a good idea to keep a consistent behaviour with LLVM, which means to resolve all symbolic links via realpath
.
This now looks good to me. I've tried it out via #755 and also confirmed (as much as I can) that Racket can take advantage of "self-exe.c" for its own search.
Support portable boot files (
%x
) on more platforms