flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
168 stars 50 forks source link

testsuite: ensure tests can survive user environment changes #5581

Open chu11 opened 1 year ago

chu11 commented 1 year ago

in PR #5576, a fix is proposed to handle a unique scenario where LC_ALL exists in the primary flux instance for testing (set via sharness.sh) but it disappears when flux proxy issues a command to a subinstance. This is presumed to be due to some environment issue with login shells in the user environment.

Generally speaking, tests can be fragile if login shell setup can muck with the expected environment needed for a test.

Currently, this potential probably appears limited so workarounds are ok for the time being. But longer out a solution could be needed if it keeps on becoming a problem.

A naive approach to fix this would be to have some script, say fluxtest.sh that would be a wrapper for the real test when a login shell has the chance to be launched. i.e.

flux proxy $id fluxtest.sh <the command you actually want to run>

and fluxtest.sh could setup the environment as needed.

chu11 commented 12 months ago

I had a mini-epiphany about #5576.

I've set LC_ALL=C in my current environment

>env | grep LC_ALL
LC_ALL=C

the following loosely emulates what's going on in a test if user's default shell is bash (test is run under bash, and flux proxy runs "bash -c")

>bash -c "bash -c env" | grep LC_ALL
LC_ALL=C

LC_ALL=C still. But what if the user's shell was tcsh.

>bash -c "tcsh -c env" | grep LC_ALL
LC_ALL=

a ha ... since my shell is tcsh, something specific in the environment is clearing LC_ALL when tcsh runs.

I tried other shells and tried hacking my SHELL environment variable but couldn't reproduce under other shells. This may not mean anything since b/c my default shell is tcsh, so that could affect things.

using tcsh's "shlvl" variable, it does not appear we're entering a login shell

>bash -c 'tcsh -c "echo \$shlvl"'
3

(i.e. if we're entering a login shell, shlvl should be 1 I think)

I still don't have an answer to why this is happening though ... I can't find anything within my .cshrc files and the system /etc/profile stuffs that might cause this.