POETSII / Orchestrator

The Orchestrator is the configuration and run-time management system for POETS platforms.
1 stars 1 forks source link

Race condition in creating symbolic links while making orchestator #249

Closed m8pple closed 3 years ago

m8pple commented 3 years ago

This is on current 1.0.0-alpha (43ba3be2a8f31fe7b03d2425b4f6b519d8b8d9d0), or any modern branch I check out. This particular compilation is on Ayres.

Not too sure where this happens, but quite a few times, if I do:

$ (cd Build/gcc && make -j8 -B)

Then it fails at the end with problems creating symbolic links:

<snip>
/local/orchestrator-common/orchestrator_dependencies_7/mpich/bin/mpicxx -MT "Objects/Source/Mothership/ThreadLogic.o" -MMD -MP -MF ./Dependency_lists/Source/Mothership/ThreadLogic.temp.d -I../../Generics -I../../Source/Common -I../../Source/OrchBase -I../../Source/OrchBase/AppStructures -I../../Source/OrchBase/Handlers -I../../Source/OrchBase/HardwareFileReader -I../../Source/OrchBase/HardwareConfigurationDeployment -I../../Source/OrchBase/Placement -I../../Source/OrchBase/Placement/Algorithms -I../../Source/OrchBase/Placement/Constraints -I../../Source/OrchBase/Placement/Exceptions -I../../Source/OrchBase/HardwareModel -I../../Source/Launcher -I../../Source/OrchBase/XMLProcessing -I../../Source/Injector -I../../Source/Parser -I../../Source/Root -I../../Source/NameServer -I../../Source/NameServer/AddressBook -I../../Source/Softswitch/inc -I../../Source/Supervisor -I/local/orchestrator-common/orchestrator_dependencies_7/mpich/include -I/local/tinsel/include -I/local/tinsel/hostlink -std=c++98 -Wall -fPIC -pthread -pedantic -O3 -DTRIVIAL_LOG_HANDLER -c -o Objects/Source/Mothership/ThreadLogic.o ../../Source/Mothership/ThreadLogic.cpp
mkdir --parents "../../Output/Composer/Orchestrator/"
ln --force --symbolic "/home/dt10/Orchestrator/Generics" "../../Output/Composer/Orchestrator/Generics"
mkdir --parents "../../Output/Composer/Orchestrator/Source/"
ln --force --symbolic "/home/dt10/Orchestrator/Source/Common" "../../Output/Composer/Orchestrator/Source/Common"
mkdir --parents "../../Output/Composer/"
ln --force --symbolic "/home/dt10/Orchestrator/Source/Softswitch" "../../Output/Composer/Softswitch"
mkdir --parents "../../Output/Composer/"
ln --force --symbolic "/local/tinsel" "../../Output/Composer/Tinsel"
ln: mkdir --parents "../../Output/Composer/"
failed to create symbolic link '../../Output/Composer/Tinsel/tinsel': Permission denied
Makefile:238: recipe for target '../../Output/Composer/Tinsel' failed
make: *** [../../Output/Composer/Tinsel] Error 1
make: *** Waiting for unfinished jobs....
ln --force --symbolic "/home/dt10/Orchestrator/Source/Softswitch/Makefile" "../../Output/Composer/Makefile"

Looks like a race condition to me, as if I then do:

$ (cd Build/gcc && make )

it completes fine.

A straight non-parallel build:

$ (cd Build/gcc && make -B )

also works fine.... wait, no, it doesn't:

/local/orchestrator-common/orchestrator_dependencies_7/mpich/bin/mpicxx -pthread -Wl,-export-dynamic -L/local/orchestrator-common/orchestrator_dependencies_7/mpich/lib -L/usr/lib \
        -o ../../bin/mothership Objects/Source/Mothership/MothershipMain.o Objects/Source/Mothership/Mothership.o Objects/Source/Mothership/AppDB.o Objects/Source/Mothership/AppInfo.o Objects/Source/Mothership/AppTransitions.o Objects/Source/Mothership/InstrumentationWriter.o Objects/Source/Mothership/LogPacketManager.o Objects/Source/Mothership/MessageUtils.o Objects/Source/Mothership/MPIHandlers.o Objects/Source/Mothership/PacketHandlers.o Objects/Source/Mothership/SuperDB.o Objects/Source/Mothership/SuperHolder.o Objects/Source/Mothership/SupervisorApiProvisioning.o Objects/Source/Mothership/ThreadComms.o Objects/Source/Mothership/ThreadLogic.o Objects/Source/Common/CommonBase.o Objects/Source/Common/Debug.o Objects/Source/Common/Environment.o Objects/Source/Common/Pglobals.o Objects/Source/Common/PMsg_p.o Objects/Source/Common/ProcMap.o Objects/Source/Common/Unrec_t.o Objects/Generics/dfprintf.o Objects/Generics/flat.o Objects/Generics/Msg_p.o Objects/Generics/NameBase.o /local/tinsel/hostlink/*.o \
        -lmpi -lpthread -ldl
mkdir --parents "../../Output/Composer/Orchestrator/"
ln --force --symbolic "/home/dt10/Orchestrator/Generics" "../../Output/Composer/Orchestrator/Generics"
mkdir --parents "../../Output/Composer/Orchestrator/Source/"
ln --force --symbolic "/home/dt10/Orchestrator/Source/Common" "../../Output/Composer/Orchestrator/Source/Common"
mkdir --parents "../../Output/Composer/"
ln --force --symbolic "/home/dt10/Orchestrator/Source/Softswitch" "../../Output/Composer/Softswitch"
mkdir --parents "../../Output/Composer/"
ln --force --symbolic "/local/tinsel" "../../Output/Composer/Tinsel"
ln: failed to create symbolic link '../../Output/Composer/Tinsel/tinsel': Permission denied
Makefile:238: recipe for target '../../Output/Composer/Tinsel' failed
make: *** [../../Output/Composer/Tinsel] Error 1
m8pple commented 3 years ago

The problem seems to be creating a link ../../Output/Composer/Tinsel/tinsel, which I think relates to Output/Composer/Tinsel/tinsel from the Orchestrator root.

I don't really know why it wants to do that, as Output/Composer/Tinsel appears to be a directory.

Anyway, it doesn't seem to cause other problems, it is more just an issue if you are scripting the make process.

m8pple commented 3 years ago

$ rm -rf Output solves it once, but then the same problem occurs next time you do a full parallel re-build.

(Though that also seems to delete checked in files.)

mvousden commented 3 years ago

I can't reproduce this, so it might be a timing/race issue. Can you please send me an ls -lR of Output/Composer in the case when it fails? (along with the stdout/err, as you have done so far). Can you also tell me if you have a /local/tinsel/ on your machine? If not, can you send me an ls -lR of the Tinsel directory in the Orchestrator?

heliosfa commented 3 years ago

I am also unable to reproduce this on my local install (Ubuntu 20.04 under WSL on an 8-core/16-thread CPU with Orchestrator Dependencies 7 in the right place) or on a POETS box.

m8pple commented 3 years ago

The opportunity cost for the time spent to repro this is probably going to outweigh benefit, given there is a hacky fix for when it happens:

make -B -j8 ; make

So I'll just close this and apply the fix on machines as and when it happens.

mvousden commented 3 years ago

If it does happen again, feel free to punt the aforementioned particulars at me and I'll take a look.