E4S-Project / e4s-cl

Container manager for E4S
https://e4s-cl.readthedocs.io
MIT License
14 stars 3 forks source link

Duplicate mount destination error when launching container #107

Closed egreen77 closed 1 year ago

egreen77 commented 1 year ago

Hi,

I've been running into an issue where e4s-cl fails to launch a container because it tries to bind multiple host libraries to the same path inside the container.

e4s-cl launch --backend=podman --image docker-archive:mpibench.tar --profile=t4 srun -N 1 /mpiBench
[+] Using selected profile t4
srun: job 2351 queued and waiting for resources
srun: job 2351 has been allocated resources
Error: /.e4s-cl/hostlibs/libibverbs.so: duplicate mount destination
Process 961819 failed with code 125
Error: /.e4s-cl/hostlibs/libibverbs.so: duplicate mount destination
Container command failed with error code 125

In the above, the container runtime complains that /usr/lib64/libibverbs.so and /lib64/libibverbs.so can't both be bound to the same destination. On the system I'm working on, /lib64 is a symlink to /usr/lib64 and we have a few libraries that use RPATH with those two paths interchangeably. It seems e4s-cl is somehow picking up the same library at both locations.

https://github.com/E4S-Project/e4s-cl/blob/843cbe9a949109795586f1f876fc8141b7e27e76/e4s_cl/cf/containers/__init__.py#L153

The workaround we came up with is to patch the function in e4s-cl that checks for duplicate binds to instead only check if bound_rhs.destination == bound_lhs.destination.

Currently the duplicate detection only triggers when both the source and destination paths are identical and thus fails to detect the case where two source paths would end up at the same location inside the container.

spoutn1k commented 1 year ago

Hey, thanks for the detailed error report !

That patch sounds like a good starting point. Returning destination matches as contained files from this closure would drop the new bind and trigger the permission widening for the old one.

My main concern is how to deal with identical destinations for different files.

I imagine on your system the following does not throw an error ?

from pathlib import Path
assert(Path("/lib64/libibverbs.so").resolve() == Path("/usr/lib64/libibverbs.so").resolve())
spoutn1k commented 1 year ago

Please let me know if this helps !

egreen77 commented 1 year ago

The assert statement in the code snippet you provided above indeed doesn't throw an error for me. I tested the linked PR and it's working for me. Thanks for the fast turnaround on this!