BioLockJ-Dev-Team / sheepdog_testing_suite

Test suite for BioLockJ development team.
3 stars 8 forks source link

direct module worker script fails even when it succeeds #301

Closed IvoryC closed 3 years ago

IvoryC commented 3 years ago

Here is a worker script. The MAIN script reports a failure, even after the worker script reports success (and the outputs look, the logs look good, it HAS been successful).

The issue is remedied by adding "exit 0" as a final line in the script. The worker script itself ALWAYS has a non-zero exit status.

#!/bin/bash

# BioLockJ.v1.3.14-dev: ${scriptDir}/02.0_RdpHierParser.sh

export BLJ=/Users/ieclabau/git/BioLockJ
export SHEP=/Users/ieclabau/git/sheepdog_testing_suite

pipeDir="/Users/ieclabau/git/sheepdog_testing_suite/MockMain/pipelines/rdp_noJarPath_2_2020Nov20"
modDir="${pipeDir}/02_RdpHierParser"
scriptDir="${modDir}/script"
tempDir="${modDir}/temp"
logDir="${modDir}/log"
outputDir="${modDir}/output"

touch "${scriptDir}/02.0_RdpHierParser.sh_Started"

exec 1>${logDir}/02.0_RdpHierParser.log
exec 2>&1

cd ${scriptDir}

function scriptFailed() {
    echo "Line #${2} failure status code [ ${3} ]:  ${1}" >> "${scriptDir}/02.0_RdpHierParser.sh_Failures"
    exit ${3}
}

function executeLine() {
    ${1}
    statusCode=$?
    [ ${statusCode} -ne 0 ] && scriptFailed "${1}" ${2} ${statusCode}
}

executeLine "java -cp /Users/ieclabau/git/BioLockJ/dist/BioLockJ.jar biolockj.BioLockJ -projectDir /Users/ieclabau/git/sheepdog_testing_suite/MockMain/pipelines -direct rdp_noJarPath_2_2020Nov20:02_RdpHierParser" ${LINENO}
IvoryC commented 3 years ago

The cause of the issue is the executeLine function.

function executeLine() {
    ${1}
    statusCode=$?
    [ ${statusCode} -ne 0 ] && scriptFailed "${1}" ${2} ${statusCode}
}

If the command in $1 fails, then statusCode is non-0, and the scriptFailed function is called. If the command in $1 is successful, then statusCode is 0, and the evaluation [ ${statusCode} -ne 0 ] is non-0; so it does not proceed with the && beyond. THAT non-0 status is how the function ends, so the executeLine function ends with a non-0 status. That's the last line in the script, so the SCRIPT ends with a non-0 status when the command inside is successful.

This same executeLine() function is used in all of our scripts, BUT in most worker scripts, there is a final line in the script after all of the calls to executeLine() there is a line to touch /this/script.sh_Success. That runs fine, has a 0 exit status, and so the script has a 0 exit status.

IvoryC commented 3 years ago

On the cluster, the worker scripts don't actually run and return a status, they are run through a qsub command, which just returns a job-id, and runs the scripts sometime later. Docker does sort of the same thing. The MAIN script doesn't get a status code from the worker script, it gets a status code from docker indicating that it did create a container, (no status on the process inside).

Locally, it frustrates me that even when I have detach java modules set to Y, they don't make scripts when running locally. There is a bit of logic that says ONLY make scripts if (detach java modules) AND (on cluster or in docker). This error may have presented before and lead to this logic. In my new RdpHierParser, the module steps around that logic so it makes a script even locally; but it always fails--even when it succeeds !

IvoryC commented 3 years ago

Resolution: at the end of every worker script, exit 0.

This worker script does not touch the Success flag, it looks like that is left up to the biolockj instance that is running in direct mode. That's reasonable, I suppose. Using exit 0 at the end of all worker scripts is sure to not conflict with any other flag stuff. We've already been assuming that if the script makes it to the end, it was "successful".

IvoryC commented 3 years ago

From JavaModuleImpl:

@Override
    public void executeTask() throws Exception {
        final boolean detached = Config.getBoolean( this, Constants.DETACH_JAVA_MODULES );
        final boolean buildDockerScript = DockerUtil.inDockerEnv() && !BioLockJUtil.isDirectMode();
        if( detached && ( buildDockerScript || Config.isOnCluster() ) ) super.executeTask();
        else runModule();
    }

I wondered why this reads: if( detached && ( buildDockerScript || Config.isOnCluster() ) ) super.executeTask(); and not simply if( detached ) super.executeTask();

IvoryC commented 3 years ago

This fix was brought into master with pull request: "Java mod #170

All worker scripts end with an echo statement (weather it is a java module or not). Java module's make scripts if detach java modules is true, regardless of environment.