chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.76k stars 414 forks source link

Launcher does not find _real #15489

Closed sprock closed 4 years ago

sprock commented 4 years ago

When a launcher cannot find the _real binary in the same directory the current behaviour is to give up and issue an error message. It would be an improvement for the launcher to search the users PATH when this happens and report an error only if the _real binary remains unfound.

Chapel 1.20.0, FreeBSD 11.3, amd64.

Thanks, Roger

bradcray commented 4 years ago

I agree that this seems attractive in that it would permit (pairs of) Chapel binaries to be moved around and run in more traditional ways than we currently permit.

gbtitus commented 4 years ago

Just some implementation notes, while this is fresh in my mind ...

Currently the common launcher code checks for the existence of the _real executable in the same directory as was specified for the launcher, and errors out if it's not found. This is done right after the system launcher command line is printed if the user throws -v, so it looks like the system launcher issues the message, but really it's the Chapel launcher that does so.

Some of the system launchers for which we have Chapel wrappers, such as aprun on the Cray X* systems, do search the PATH environment variable for the _real executable. The Chapel launchers for those would need to take PATH into account in that search, when the launcher itself was run without a path. (No PATH searching is needed when the Chapel launcher is run with an explicit path, obviously.) Other system launchers, such as GASNet's amudprun and Slurm's srun, do not search PATH, and the corresponding Chapel launchers could skip that step as well.

Edit: The last sentence above (struck through) is wrong, even when taken as applying just to the current existence check for the _real. Bottom line: on systems where PATH applies and the readlink() technique described below isn't available we'll have to do a PATH search no matter what, either to make sure the system launcher will find some executable, if it's the kind that searches PATH itself, or to find the _real executable on behalf of the system launcher, if it's the kind that doesn't search PATH. Mainly, the _real existence check is just one more thing we do to try to help hide the details of system launch from the user.

Also note that under Linux at least, we don't need to search and indeed, shouldn't do so. The Chapel launcher should instead just readlink("/proc/<pid>/exe") to get its own actual pathname, append _real, and be done with it. No search necessary. This would dodge a subtle race condition in which the Chapel launcher is found in some component of PATH and executed by the shell, and then before it searches for the _real an executable file with that same name is created in some other, earlier, component of PATH. The launcher's search will then find the newer _real rather than the one that goes with itself.

bradcray commented 4 years ago

Greg: Would there be any downside to doing that readlink() step in the common launcher code for all launchers? I.e., for those system launchers that would normally search the path, would they still work fine if the common launcher just gave them an explicit path, forgoing their normal search? (this seems like it could also help with the multiple _real's search that you mention).

gbtitus commented 4 years ago

@bradcray Yes, the readlink() technique is preferable either way. Ultimately, to resolve the Issue we're commenting upon some launcher (system or Chapel) has to search PATH, and the race I described is independent of which one does that.

daviditen commented 4 years ago

The readlink() idea works for Linux, but doesn't work on other platforms, e.g. OSX. I think it would be good to use a highly portable implementation of that sort of operation. In particular, https://github.com/gpakosz/whereami appears to portably do exactly what we want here and has open licenses with a duel MIT and WTFPL license.

bradcray commented 4 years ago

Does the readlink() approach not work on other platforms, or is it simply that readlink() isn't always available?

A potential challenge with a third-party package is that I'm not certain what process is required at HPE to integrate new open-source packages (which is not to say we couldn't find out, just that it's new territory).

gbtitus commented 4 years ago

Not to short-circuit @daviditen's response, but (1) the file I proposed readlink()ing is Linux-specific and doesn't exist on MacOS, and (2) though readlink(2) is POSIX it seems not to exist on MacOS either.

daviditen commented 4 years ago

Right, /proc/\<pid>/exe is Linux specific and doesn't exist on OSX. I haven't checked other platforms, but I wouldn't expect it on non-Linux. We'll have to do distinct solutions on each platform, which is what the whereami library does.

bradcray commented 4 years ago

the file I proposed readlink()ing is Linux-specific and doesn't exist on MacOS

I think I was the one who took it one step further and asked whether we could move this logic to launcher-general code so that it would apply to all launchers and not just certain ones(?).

gbtitus commented 4 years ago

I think I was the one who took it one step further ...

I would say you're off the hook, because I think the original problem had to do with the amudprun launcher which can end up being used on both Linux and MacOS anyway.

bradcray commented 4 years ago

Roger / @sprock — Thanks for filing this feature request. If you're able to try a copy of Chapel from the master branch to see whether this addresses your needs as expected, that would be great.

sprock commented 4 years ago

Hello,

Brad Chamberlain notifications@github.com writes:

Roger / @sprock — Thanks for filing this feature request. If you're able to try a copy of Chapel from the master branch to see whether this addresses your needs as expected, that would be great.

This is what happens when I try to build the git version:

  1. unset CHPL_COMM CHPL_COMM_SUBSTRATE CHPL_LAUNCHER CHPL_HOME
  2. source util/quickstart/setchplenv.bash ( QUICKSTART!!! )

Setting CHPL_HOME to /home/rmason/Software/Chapel/git-chapel_a9a888_20200619 Updating PATH to include /home/rmason/Software/Chapel/git-chapel_a9a888_20200619/bin/freebsd-amd64 and /home/rmason/Software/Chapel/git-chapel_a9a888_20200619/util Updating MANPATH to include /home/rmason/Software/Chapel/git-chapel_a9a888_20200619/man Setting CHPL_COMM to none Setting CHPL_TASKS to fifo Setting CHPL_MEM to cstdlib Setting CHPL_GMP to none Setting CHPL_REGEXP to none Setting CHPL_LLVM to none

  1. gmake
  2. gmake check [Info] Running minimal test script: $CHPL_HOME/util/test/checkChplInstall [Info] Found executable chpl in /home/rmason/Software/Chapel/git-chapel_a9a888_20200619/bin/freebsd-amd64/chpl. [Info] Found $CHPL_HOME directory: /usr/home/rmason/Software/Chapel/git-chapel_a9a888_20200619 [Info] /home/rmason/.chpl does not exist. Creating it. [Info] Temporary test job directory: /home/rmason/.chpl/chapel-test-EHVAJ [Info] Compiling $CHPL_HOME/test/release/examples/hello6-taskpar-dist.chpl [Info] Compiling with C backend [Fail] Test job failed to compile - Chapel is not installed correctly [Fail] Compilation output: In file included from /tmp//chpl-rmason-7131.deleteme/_main.c:2: In file included from /tmp//chpl-rmason-7131.deleteme/chpl__header.h:9: In file included from /usr/home/rmason/Software/Chapel/git-chapel_a9a888_20200619/runtime/include/stdchpl.h:54: In file included from /usr/home/rmason/Software/Chapel/git-chapel_a9a888_20200619/runtime/include/chplio.h:31: In file included from /usr/home/rmason/Software/Chapel/git-chapel_a9a888_20200619/runtime/include/qio/qio-all.h:27: In file included from /usr/home/rmason/Software/Chapel/git-chapel_a9a888_20200619/runtime/include/qio/qio_formatted.h:28: In file included from /usr/local/include/error.h:10: /usr/local/include/macdecls.h:69:30: error: this function declaration is not a prototype [-Werror,-Wstrict-prototypes] extern Boolean MA_initialized(); ^ void /usr/local/include/macdecls.h:100:42: error: this function declaration is not a prototype [-Werror,-Wstrict-prototypes] extern void MA_summarize_allocated_blocks(); ^ void /usr/local/include/macdecls.h:102:41: error: this function declaration is not a prototype [-Werror,-Wstrict-prototypes] extern Boolean MA_verify_allocator_stuff(); ^ void /usr/local/include/macdecls.h:103:46: error: this function declaration is not a prototype [-Werror,-Wstrict-prototypes] extern void MA_set_error_callback(void(*func)()); ^ void /usr/local/include/macdecls.h:105:34: error: this function declaration is not a prototype [-Werror,-Wstrict-prototypes] extern void ma_set_error_callback(); ^ void 5 errors generated. gmake[1]: [/usr/home/rmason/Software/Chapel/git-chapel_a9a888_20200619/runtime/etc/Makefile.exe:29: /tmp//chpl-rmason-7131.deleteme/hello6-taskpar-dist.tmp] Error 1 error: compiling generated source gmake: [Makefile:204: check] Error 1

Is there some other step I need to perform for the git version?

Cheers, Roger

bradcray commented 4 years ago

Hi Roger —

Thanks for trying this out. That error is not familiar to me so thanks for reporting it. I'm trying to reproduce and failing so far, but in the meantime, could you let me know what back-end C compiler and version you're using (e.g., gcc x.y or clang y.z).

Also, I think there's a good chance that even though the make check is failing, the installation you have will probably still work for the original issue if you want to check into that. Specifically, if I'm interpreting the make check steps correctly, I believe it's throwing stronger checking on the C compilation step than we use by default (e.g., the -Werror mentioned above)

Thanks, -Brad

sprock commented 4 years ago

Hello Brad,

Brad Chamberlain notifications@github.com writes:

Thanks for trying this out.

Glad to help.

That error is not familiar to me so thanks for reporting it. I'm trying to reproduce and failing so far, but in the meantime, could you let me know what back-end C compiler and version you're using (e.g., gcc x.y or clang y.z).

cc --version FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0) Target: x86_64-unknown-freebsd11.3 Thread model: posix InstalledDir: /usr/bin

Also, I think there's a good chance that even though the make check is failing, the installation you have will probably still work for the original issue if you want to check into that. Specifically, if I'm interpreting the make check steps correctly, I believe it's throwing stronger checking on the C compilation step than we use by default (e.g., the -Werror mentioned above)

OK, I'll give it a whirl.

Cheers, Roger

bradcray commented 4 years ago

Hi Roger —

Thanks for the additional information. I haven't been able to reproduce the failure mode using my linux-based copy of clang 8.0.0, but given that the error it's pointing to is in your system header files (rather than Chapel code), I'm guessing you'd see it for any compilation that #included these headers and used -Wstrict-prototypes (though there may be some #defines we set that send it down certain paths?).

As mentioned above, I'm now feeling more certain that our make check step is being too stringent and have opened up PR #15893 to relax it, which I think should fix the behavior you saw above. But it also makes me feel more confident that compilations outside of make check should work for you.

-Brad

bradcray commented 4 years ago

(And I've now merged that PR if you wanted to grab a new copy of master to verify that it fixes make check for you. But there's no real need to do so unless you're curious... if your by-hand chpl compiles pass, I'm confident that this will as well).

sprock commented 4 years ago

Hello Brad,

Brad Chamberlain notifications@github.com writes:

(And I've now merged that PR if you wanted to grab a new copy of master to verify that it fixes make check for you. But there's no real need to do so unless you're curious... if your by-hand chpl compiles pass, I'm confident that this will as well).

Yes, that issue is fixed.

Thank you.

Roger

sprock commented 4 years ago

Hi Brad,

Brad Chamberlain notifications@github.com writes:

Thanks for filing this feature request. If you're able to try a copy of Chapel from the master branch to see whether this addresses your needs as expected, that would be great.

I can confirm that issue #15489 is fixed.

Many thanks.

Best wishes, Roger

bradcray commented 4 years ago

Thank you, both for filing the issue(s) and for verifying the fix(es)!