flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 50 forks source link

Missing test_must_fail function in Flux testing environment #5862

Closed mattf4171 closed 7 months ago

mattf4171 commented 7 months ago

Within flux-core/t/t3203-instance-recovery.t file, flux start --recovery=$(pwd)/test2 command is succeeding even though the content.sqlite file is set to read-only (chmod 400).

test_expect_success 'recovery mode aborts early if content unwritable' ' mkdir -p test2 && pwd touch test2/content.sqlite && chmod 400 test2/content.sqlite && test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err && grep "no write permission" nowrite.err '

Why is recovery mode not attempting to write to content.sqlite file?

I ran the shell command outside of the test case for debugging purposes:

`

Singularity> mkdir -p test2 && pwd touch test2/content.sqlite && chmod 400 test2/content.sqlite && test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err && grep "no write permission" nowrite.err /root/Flux/flux-core/t Singularity> cat nowrite.err bash: test_must_fail: command not found `

Environment:

Steps to Reproduce:

  1. Build a Singularity container using the provided .def file.
  2. Enter the container and navigate to the flux-core/t directory.
  3. Run the test script using prove -v t3203-instance-recovery.t.

Expected Behavior: Tests should run successfully, with the test_must_fail function correctly failing commands that are expected to fail.

Actual Behavior: Tests that use test_must_fail fail with the error bash: test_must_fail: command not found.

Additional Information:

The .def file used for building the Singularity container includes the necessary dependencies and sets up the environment for Flux. The issue persists even after ensuring proper permissions and environment variable settings.

grondo commented 7 months ago

Can you post the output of ./t3203-instance-recovery.t -d -v in your environment?

Why is recovery mode not attempting to write to content.sqlite file?

It does. Pointing the flux instance at the sqlite file means that Flux will try to write to it.

I ran the shell command outside of the test case for debugging purposes:

You cannot run sharness test scripts outside of the sharness environment. test_must_fail is defined in sharness.sh which all test files source at the top before running any tests:

. `dirname $0`/sharness.sh
mattf4171 commented 7 months ago

@grondo Here is the output of ./t3203-instance-recovery.t -d -v :

Singularity> ./t3203-instance-recovery.t -d -v
sharness: loading extensions from /root/Flux/flux-core/t/sharness.d/01-setup.sh
sharness: loading extensions from /root/Flux/flux-core/t/sharness.d/flux-sharness.sh
expecting success:
    mkdir -p test1 &&
    flux start --test-size=4 --test-exit-timeout=500s \
        -o,-Sstatedir=$(pwd)/test1 /bin/true

ok 1 - start a persistent instance of size 4

expecting success:
    run_timeout --env=SHELL=/bin/sh 120 \
        $runpty -i none flux start \
            -o,-Sbroker.rc1_path= \
            -o,-Sbroker.rc3_path= \
            --recovery=$(pwd)/test1 >banner.out &&
    grep "Entering Flux recovery mode" banner.out

| Entering Flux recovery mode.
ok 2 - banner message is printed in interactive recovery mode

expecting success:
    flux start --recovery=$(pwd)/test1 \
        -o,-Sbroker.rc1_path=/bin/false \
        -o,-Sbroker.rc3_path= \
        echo "hello world" >hello.out &&
    grep hello hello.out

Apr 05 23:15:50.055124 broker.err[0]: rc1.0: /bin/false Exited (rc=1) 0.0s
hello world
ok 3 - rc1 failure is ignored in recovery mode

expecting success:
    echo 4 >down.exp &&
    flux start --recovery=$(pwd)/test1 \
        flux resource list -s down -no {nnodes} >down.out &&
    test_cmp down.exp down.out

ok 4 - resources are offline in recovery mode

expecting success:
    flux start --recovery=$(pwd)/test1 \
        flux dump --checkpoint test1.tar

flux-dump: archived 4 keys
ok 5 - dump test instance in recovery mode

expecting success:
    flux start --recovery=$(pwd)/test1.tar \
        flux resource list -s down -no {nnodes} >down_dump.out &&
    test_cmp down.exp down_dump.out

ok 6 - recovery mode also works with dump file

expecting success:
    run_timeout --env=SHELL=/bin/sh 120 \
        $runpty -i none flux start \
            -o,-Sbroker.rc1_path= \
            -o,-Sbroker.rc3_path= \
            --recovery=$(pwd)/test1.tar >banner2.out &&
    grep "changes will not be preserved" banner2.out

| statedir           changes will not be preserved
ok 7 - banner message warns changes are not persistent

expecting success:
    test_must_fail flux start --recovery=$(pwd)/test2 2>dirmissing.err &&
    grep "No such file or directory" dirmissing.err

flux-start: /root/Flux/flux-core/t/trash-directory.t3203-instance-recovery/test2: No such file or directory
ok 8 - recovery mode aborts early if statedir is missing

expecting success:
    mkdir -p test2 &&
    chmod 600 test2 &&
    test_must_fail flux start --recovery=$(pwd)/test2 2>norwx.err &&
    grep "no access" norwx.err

flux-start: /root/Flux/flux-core/t/trash-directory.t3203-instance-recovery/test2: no access
ok 9 - recovery mode aborts early if statedir lacks rwx

expecting success:
    chmod 700 test2 &&
    test_must_fail flux start --recovery=$(pwd)/test2 2>empty.err &&
    grep "No such file or directory" empty.err

flux-start: /root/Flux/flux-core/t/trash-directory.t3203-instance-recovery/test2/content.sqlite: No such file or directory
ok 10 - recovery mode aborts early if content is missing

expecting success:
    mkdir -p test2 && pwd touch test2/content.sqlite &&
        chmod 400 test2/content.sqlite &&
        test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
        grep "no write permission" nowrite.err

/root/Flux/flux-core/t/trash-directory.t3203-instance-recovery
chmod: cannot access 'test2/content.sqlite': No such file or directory
not ok 11 - recovery mode aborts early if content unwritable
#
#           mkdir -p test2 && pwd touch test2/content.sqlite &&
#               chmod 400 test2/content.sqlite &&
#               test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
#               grep "no write permission" nowrite.err
#

expecting success:
    chmod 200 test2/content.sqlite &&
    test_must_fail flux start --recovery=$(pwd)/test2 2>noread.err &&
    grep "no read permission" noread.err

chmod: cannot access 'test2/content.sqlite': No such file or directory
not ok 12 - recovery mode aborts early if content unreadable
#
#           chmod 200 test2/content.sqlite &&
#           test_must_fail flux start --recovery=$(pwd)/test2 2>noread.err &&
#           grep "no read permission" noread.err
#

# failed 2 among 12 test(s)
1..12

perhaps the touch command to create the file for test 11 is failing?

grondo commented 7 months ago

Can you wrap the output in triple-backticks? e.g.

paste output here

It is difficult to read the output with all the markdown formatting.

grondo commented 7 months ago

Are you running the tests as root? I don't think that is tested (or advised)

Edit: though I'm not sure that's the issue here.

mattf4171 commented 7 months ago

Are you running the tests as root? I don't think that is tested (or advised)

Edit: though I'm not sure that's the issue here.

within my singularity container I am running as --fakeroot

grondo commented 7 months ago

perhaps the touch command to create the file for test 11 is failing?

Good thought, though I'd think we'd see some sort of error. You can edit the test itself and add some debugging (like ls -l test2 after touch, etc).

mattf4171 commented 7 months ago

perhaps the touch command to create the file for test 11 is failing?

Good thought, though I'd think we'd see some sort of error. You can edit the test itself and add some debugging (like ls -l test2 after touch, etc).

expecting success:
    touch test2/content.sqlite && ls -l test2 &&
        chmod 400 test2/content.sqlite &&
        test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
        grep "no write permission" nowrite.err

total 0
-rw-------. 1 root root 0 Apr  5 23:24 content.sqlite
not ok 11 - recovery mode aborts early if content unwritable
#
#           touch test2/content.sqlite && ls -l test2 &&
#               chmod 400 test2/content.sqlite &&
#               test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
#               grep "no write permission" nowrite.err
#

The touch command is working as expected. The cause of the fail is very strange.

chu11 commented 7 months ago

would it be worthwhile to do the ls -l test2 after the chmod, to make sure that is actually working?

grondo commented 7 months ago
chmod: cannot access 'test2/content.sqlite': No such file or directory
not ok 11 - recovery mode aborts early if content unwritable
#
#           mkdir -p test2 && pwd touch test2/content.sqlite &&
#               chmod 400 test2/content.sqlite &&
#               test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
#               grep "no write permission" nowrite.err
#

Well this one failed because you have pwd touch instead of pwd && touch.

I don't see any error from your previous output. Was that all the output?

Maybe make the following change to that script so we can see the output of flux-start:

diff --git a/t/t3203-instance-recovery.t b/t/t3203-instance-recovery.t
index 3ac63c61e..c739a06ec 100755
--- a/t/t3203-instance-recovery.t
+++ b/t/t3203-instance-recovery.t
@@ -87,6 +87,8 @@ test_expect_success 'recovery mode aborts early if content is missing' '
 test_expect_success 'recovery mode aborts early if content unwritable' '
        touch test2/content.sqlite &&
        chmod 400 test2/content.sqlite &&
+       ls -l test2 &&
+       test_must_fail flux start --recovery=$(pwd)/test2 &&
        test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
        grep "no write permission" nowrite.err
 '

Edited diff to include @chu11's suggestion.

mattf4171 commented 7 months ago
chmod: cannot access 'test2/content.sqlite': No such file or directory
not ok 11 - recovery mode aborts early if content unwritable
#
#           mkdir -p test2 && pwd touch test2/content.sqlite &&
#               chmod 400 test2/content.sqlite &&
#               test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
#               grep "no write permission" nowrite.err
#

Well this one failed because you have pwd touch instead of pwd && touch.

I don't see any error from your previous output. Was that all the output?

Maybe make the following change to that script so we can see the output of flux-start:

diff --git a/t/t3203-instance-recovery.t b/t/t3203-instance-recovery.t
index 3ac63c61e..c739a06ec 100755
--- a/t/t3203-instance-recovery.t
+++ b/t/t3203-instance-recovery.t
@@ -87,6 +87,8 @@ test_expect_success 'recovery mode aborts early if content is missing' '
 test_expect_success 'recovery mode aborts early if content unwritable' '
        touch test2/content.sqlite &&
        chmod 400 test2/content.sqlite &&
+       ls -l test2 &&
+       test_must_fail flux start --recovery=$(pwd)/test2 &&
        test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
        grep "no write permission" nowrite.err
 '

Edited diff to include @chu11's suggestion.

@grondo @chu11 Ive implemented the changes above and here is the new output from ./t3203-instance-recovery.t -d -v

Singularity> ./t3203-instance-recovery.t -d -v
sharness: loading extensions from /root/Flux/flux-core/t/sharness.d/01-setup.sh
sharness: loading extensions from /root/Flux/flux-core/t/sharness.d/flux-sharness.sh
expecting success:
    mkdir -p test1 &&
    flux start --test-size=4 --test-exit-timeout=500s \
        -o,-Sstatedir=$(pwd)/test1 /bin/true

ok 1 - start a persistent instance of size 4

expecting success:
    run_timeout --env=SHELL=/bin/sh 120 \
        $runpty -i none flux start \
            -o,-Sbroker.rc1_path= \
            -o,-Sbroker.rc3_path= \
            --recovery=$(pwd)/test1 >banner.out &&
    grep "Entering Flux recovery mode" banner.out

| Entering Flux recovery mode.
ok 2 - banner message is printed in interactive recovery mode

expecting success:
    flux start --recovery=$(pwd)/test1 \
        -o,-Sbroker.rc1_path=/bin/false \
        -o,-Sbroker.rc3_path= \
        echo "hello world" >hello.out &&
    grep hello hello.out

Apr 08 20:49:03.155516 broker.err[0]: rc1.0: /bin/false Exited (rc=1) 0.0s
hello world
ok 3 - rc1 failure is ignored in recovery mode

expecting success:
    echo 4 >down.exp &&
    flux start --recovery=$(pwd)/test1 \
        flux resource list -s down -no {nnodes} >down.out &&
    test_cmp down.exp down.out

ok 4 - resources are offline in recovery mode

expecting success:
    flux start --recovery=$(pwd)/test1 \
        flux dump --checkpoint test1.tar

flux-dump: archived 4 keys
ok 5 - dump test instance in recovery mode

expecting success:
    flux start --recovery=$(pwd)/test1.tar \
        flux resource list -s down -no {nnodes} >down_dump.out &&
    test_cmp down.exp down_dump.out

ok 6 - recovery mode also works with dump file

expecting success:
    run_timeout --env=SHELL=/bin/sh 120 \
        $runpty -i none flux start \
            -o,-Sbroker.rc1_path= \
            -o,-Sbroker.rc3_path= \
            --recovery=$(pwd)/test1.tar >banner2.out &&
    grep "changes will not be preserved" banner2.out

| statedir           changes will not be preserved
ok 7 - banner message warns changes are not persistent

expecting success:
    test_must_fail flux start --recovery=$(pwd)/test2 2>dirmissing.err &&
    grep "No such file or directory" dirmissing.err

flux-start: /root/Flux/flux-core/t/trash-directory.t3203-instance-recovery/test2: No such file or directory
ok 8 - recovery mode aborts early if statedir is missing

expecting success:
    mkdir -p test2 &&
    chmod 600 test2 &&
    test_must_fail flux start --recovery=$(pwd)/test2 2>norwx.err &&
    grep "no access" norwx.err

flux-start: /root/Flux/flux-core/t/trash-directory.t3203-instance-recovery/test2: no access
ok 9 - recovery mode aborts early if statedir lacks rwx

expecting success:
    chmod 700 test2 &&
    test_must_fail flux start --recovery=$(pwd)/test2 2>empty.err &&
    grep "No such file or directory" empty.err

flux-start: /root/Flux/flux-core/t/trash-directory.t3203-instance-recovery/test2/content.sqlite: No such file or directory
ok 10 - recovery mode aborts early if content is missing

expecting success:
    touch test2/content.sqlite &&
        chmod 400 test2/content.sqlite &&
    ls -l test2 &&
    test_must_fail flux start --recovery=$(pwd)/test2 &&
        test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
        grep "no write permission" nowrite.err

total 0
-r--------. 1 root root 0 Apr  8 20:49 content.sqlite
flux-broker: stdin is not a tty - can't run interactive shell
not ok 11 - recovery mode aborts early if content unwritable
#
#           touch test2/content.sqlite &&
#               chmod 400 test2/content.sqlite &&
#           ls -l test2 &&
#           test_must_fail flux start --recovery=$(pwd)/test2 &&
#               test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
#               grep "no write permission" nowrite.err
#

expecting success:
    chmod 200 test2/content.sqlite &&
    test_must_fail flux start --recovery=$(pwd)/test2 2>noread.err &&
    grep "no read permission" noread.err

not ok 12 - recovery mode aborts early if content unreadable
#
#           chmod 200 test2/content.sqlite &&
#           test_must_fail flux start --recovery=$(pwd)/test2 2>noread.err &&
#           grep "no read permission" noread.err
#

# failed 2 among 12 test(s)
1..12

New output shows that ls works as expected but I get an error flux-broker: stdin is not a tty - can't run interactive shell

grondo commented 7 months ago
flux-broker: stdin is not a tty - can't run interactive shell

Ah, this error is because flux start doesn't have an argument, so it tries to start an interactive shell. This just means if the test does fail it could be for the wrong reason.We should probably add true as arguments to all these flux start calls.

I'm beginning to think that read and write-only permissions are not working in your environment, perhaps because of fakeroot.

Can you try the following patch? Or just run make check without fakeroot, it should not be necessary.

diff --git a/t/t3203-instance-recovery.t b/t/t3203-instance-recovery.t
index 3ac63c61e..962de50f0 100755
--- a/t/t3203-instance-recovery.t
+++ b/t/t3203-instance-recovery.t
@@ -70,29 +70,32 @@ test_expect_success 'banner message warns changes are not persistent' '
        grep "changes will not be preserved" banner2.out
 '
 test_expect_success 'recovery mode aborts early if statedir is missing' '
-       test_must_fail flux start --recovery=$(pwd)/test2 2>dirmissing.err &&
+       test_must_fail flux start --recovery=$(pwd)/test2 true 2>dirmissing.err &&
        grep "No such file or directory" dirmissing.err
 '
 test_expect_success 'recovery mode aborts early if statedir lacks rwx' '
        mkdir -p test2 &&
        chmod 600 test2 &&
-       test_must_fail flux start --recovery=$(pwd)/test2 2>norwx.err &&
+       test_must_fail flux start --recovery=$(pwd)/test2 true 2>norwx.err &&
        grep "no access" norwx.err
 '
 test_expect_success 'recovery mode aborts early if content is missing' '
        chmod 700 test2 &&
-       test_must_fail flux start --recovery=$(pwd)/test2 2>empty.err &&
+       test_must_fail flux start --recovery=$(pwd)/test2 true 2>empty.err &&
        grep "No such file or directory" empty.err
 '
-test_expect_success 'recovery mode aborts early if content unwritable' '
+#  Check if read-only permissions work in this environment
+touch readonly && chmod 400 readonly
+printf test > readonly || test_set_prereq WORKING_PERMS
+test_expect_success WORKING_PERMS 'recovery mode aborts early if content unwritable' '
        touch test2/content.sqlite &&
        chmod 400 test2/content.sqlite &&
-       test_must_fail flux start --recovery=$(pwd)/test2 2>nowrite.err &&
+       test_must_fail flux start --recovery=$(pwd)/test2 true 2>nowrite.err &&
        grep "no write permission" nowrite.err
 '
-test_expect_success 'recovery mode aborts early if content unreadable' '
+test_expect_success WORKING_PERMS 'recovery mode aborts early if content unreadable' '
        chmod 200 test2/content.sqlite &&
-       test_must_fail flux start --recovery=$(pwd)/test2 2>noread.err &&
+       test_must_fail flux start --recovery=$(pwd)/test2 true 2>noread.err &&
        grep "no read permission" noread.err
 '
mattf4171 commented 7 months ago

@grondo I'm still using fakeroot in my shell and made the above changes but see that the two tests that were failing are now being skipped.

Singularity> ./t3203-instance-recovery.t -d -v
sharness: loading extensions from /root/Flux/flux-core/t/sharness.d/01-setup.sh
sharness: loading extensions from /root/Flux/flux-core/t/sharness.d/flux-sharness.sh
expecting success:
    mkdir -p test1 &&
    flux start --test-size=4 --test-exit-timeout=500s \
        -o,-Sstatedir=$(pwd)/test1 /bin/true

ok 1 - start a persistent instance of size 4

expecting success:
        cat >recov_attrs.exp <<-EOT &&
        1
        1
        5
        EOT
        flux start --recovery=$(pwd)/test1 \
            -o,-Sbroker.rc1_path= \
            -o,-Sbroker.rc3_path= \
            bash -c " \
                flux getattr broker.recovery-mode && \
                flux getattr broker.quorum && \
                flux getattr log-stderr-level" >recov_attrs.out &&
        test_cmp recov_attrs.exp recov_attrs.exp

ok 2 - expected broker attributes are set in recovery mode

expecting success:
    run_timeout --env=SHELL=/bin/sh 120 \
        $runpty -i none flux start \
            -o,-Sbroker.rc1_path= \
            -o,-Sbroker.rc3_path= \
            --recovery=$(pwd)/test1 >banner.out &&
    grep "Entering Flux recovery mode" banner.out

| Entering Flux recovery mode.
ok 3 - banner message is printed in interactive recovery mode

expecting success:
    flux start --recovery=$(pwd)/test1 \
        -o,-Sbroker.rc1_path=/bin/false \
        -o,-Sbroker.rc3_path= \
        echo "hello world" >hello.out &&
    grep hello hello.out

Apr 08 21:36:54.145244 broker.err[0]: rc1.0: /bin/false Exited (rc=1) 0.0s
hello world
ok 4 - rc1 failure is ignored in recovery mode

expecting success:
    echo 4 >down.exp &&
    flux start --recovery=$(pwd)/test1 \
        flux resource list -s down -no {nnodes} >down.out &&
    test_cmp down.exp down.out

ok 5 - resources are offline in recovery mode

expecting success:
    flux start --recovery=$(pwd)/test1 \
        flux dump --checkpoint test1.tar

flux-dump: archived 4 keys
ok 6 - dump test instance in recovery mode

expecting success:
    flux start --recovery=$(pwd)/test1.tar \
        flux resource list -s down -no {nnodes} >down_dump.out &&
    test_cmp down.exp down_dump.out

ok 7 - recovery mode also works with dump file

expecting success:
    run_timeout --env=SHELL=/bin/sh 120 \
        $runpty -i none flux start \
            -o,-Sbroker.rc1_path= \
            -o,-Sbroker.rc3_path= \
            --recovery=$(pwd)/test1.tar >banner2.out &&
    grep "changes will not be preserved" banner2.out

| statedir           changes will not be preserved
ok 8 - banner message warns changes are not persistent

expecting success:
    test_must_fail flux start --recovery=$(pwd)/test2 true 2>dirmissing.err &&
    grep "No such file or directory" dirmissing.err

flux-start: /root/Flux/flux-core/t/trash-directory.t3203-instance-recovery/test2: No such file or directory
ok 9 - recovery mode aborts early if statedir is missing

expecting success:
    mkdir -p test2 &&
    chmod 600 test2 &&
    test_must_fail flux start --recovery=$(pwd)/test2 true 2>norwx.err &&
    grep "no access" norwx.err

flux-start: /root/Flux/flux-core/t/trash-directory.t3203-instance-recovery/test2: no access
ok 10 - recovery mode aborts early if statedir lacks rwx

expecting success:
    chmod 700 test2 &&
    test_must_fail flux start --recovery=$(pwd)/test2 true 2>empty.err &&
    grep "No such file or directory" empty.err

flux-start: /root/Flux/flux-core/t/trash-directory.t3203-instance-recovery/test2/content.sqlite: No such file or directory
ok 11 - recovery mode aborts early if content is missing

skipping test: recovery mode aborts early if content unwritable
    touch test2/content.sqlite &&
    chmod 400 test2/content.sqlite &&
        test_must_fail flux start --recovery=$(pwd)/test2 true 2>nowrite.err &&
        grep "no write permission" nowrite.err

ok 12 # skip recovery mode aborts early if content unwritable (missing WORKING_PERMS)

skipping test: recovery mode aborts early if content unreadable
    chmod 200 test2/content.sqlite &&
    test_must_fail flux start --recovery=$(pwd)/test2 true 2>noread.err &&
    grep "no read permission" noread.err

ok 13 # skip recovery mode aborts early if content unreadable (missing WORKING_PERMS)

# passed all 13 test(s)
1..13
grondo commented 7 months ago

Yes, that means that fakeroot makes it so that the process can write to a file with read-only permissions, and read from a file with write-only permissions. This breaks an assumption in the tests that the filesystem works.

This test is fixable via the kludge posted above, but who knows what else would break.

I'd suggest not building or running make check under fakeroot. I believe fakeroot is meant for the make install step, though I'm not an expert.

mattf4171 commented 7 months ago

@grondo I understand that fakeroot might not be suitable for building or running make check for Flux due to its impact on filesystem permissions.

Unfortunately, I am working in an environment where I do not have sudo privileges, and fakeroot is the only elevated privilege available to me. This restriction makes it challenging to follow the standard build and test process for Flux.

Given this limitation, do you have any recommendations or alternative approaches that I could consider to build and test Flux within a Singularity container without sudo privileges? Any guidance or workarounds would be greatly appreciated.

grondo commented 7 months ago

Yes. My suggestion is to use best practice for building software:

Of course, it is also possible to side install packages to a non-standard prefix. Many people use flux without having any kind of sudo privileges.

Perhaps singularity has requirements of which I'm not aware that are forcing you to use fakeroot. In that case, I can propose the changes posted above as a PR and that will get you at least past this particular issue.

mattf4171 commented 7 months ago

@grondo Thank you for the suggestions. I understand the best practices for building software and agree with the approach of using unprivileged accounts for building and testing, with elevated privileges only required for installation.

In my current environment, using Singularity, I am constrained by the system policies that limit my access to sudo privileges. As a result, I am exploring ways to work within these restrictions while still being able to build and test Flux.

I appreciate your offer to propose changes that could help bypass this particular issue. If it's not too much trouble, I would be grateful for a PR that addresses this, as it would allow me to progress with my work.

Additionally, if there are any other tips or workarounds for building and testing Flux within a Singularity container without sudo privileges, I'd like to hear them.

grondo commented 7 months ago

Unfortunately, I've not used Singularity so I won't be helpful here. Calling @vsoch the expert!

vsoch commented 7 months ago

@mattf4171 you are wanting to run flux via a singularity container? That might be possible but it would be challenging. I would check out singularity compose and you will need to bind every single location that requires write, as singularity is a read only sif. You will also need to expose ports between "nodes" your workers.

To step back, why do you want to do this? You are likely better off installing flux with conda in an environment.

vsoch commented 7 months ago

Additionally, if there are any other tips or workarounds for building and testing Flux within a Singularity container without sudo privileges, I'd like to hear them.

If you want to do a custom build of flux you need to use external CI (e.g., GitHub actions) and push to a registry and pull down to your system. And ensure that nothing is installed as root.