galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.42k stars 1.01k forks source link

Bug: Galaxy sets the GALAXY_SLOTS variable IF NOT SET to 1 #5369

Open mblue9 opened 6 years ago

mblue9 commented 6 years ago

Hello,

I'm trying to submit a tool (EGSEA) to IUC but it's failing Travis (here: https://travis-ci.org/galaxyproject/tools-iuc/jobs/332650210#L1606) because of a current bug with GALAXY_SLOTS in Galaxy.

For this parameter --threads ${GALAXY_SLOTS:-2}

It should be using 2 threads as GALAXY_SLOTS is not set but it's only using 1.

If this could be fixed that would be great as otherwise the tool can't pass Travis and get into the toolshed.

ping @bgruening

nsoranzo commented 6 years ago

@mblue9 I don't think that's a Galaxy bug, the local job runner by default sets GALAXY_SLOTS to 1. If you need at least 2 slots, I'd use something like the following in the <command> section of the wrapper:

# EGSEA requires at least 2 threads
SLOTS=${GALAXY_SLOTS:-2};
[ "$SLOTS" -eq 1 ] && SLOTS=2;
...
--threads $SLOTS
bgruening commented 6 years ago

I see it from a different angle. If ${GALAXY_SLOTS:-2} is not respected and the default is not used this is unexpected devolopers. Its the second time people asking why this is happening.

Is there any reason why the local job runner defaults to 1 instead of using the default from the tool dev?

nsoranzo commented 6 years ago

If ${GALAXY_SLOTS:-2} is not respected and the default is not used this is unexpected

@bgruening GALAXY_SLOTS environment variable is set by Galaxy inside the job script to the number of cores assigned by the admin for the job destination. For the local job runner that is 1 by default, unless you change it inside job_conf.xml with something like:

<destination id="local" runner="local">
    <param id="local_slots">4</param>
</destination>

The ${GALAXY_SLOTS:-2} syntax is shell syntax which works as expected, I just think that is not very useful any more since Galaxy always defines the variable (if I read the galaxy code correctly).

mvdbeek commented 6 years ago

Is there any reason why the local job runner defaults to 1 instead of using the default from the tool dev?

IMO that defeats the purpose of specifying the available cores in the job_conf.xml file.

bgruening commented 6 years ago

The ${GALAXY_SLOTS:-2} syntax is shell syntax which works as expected, I just think that is not very useful any more since Galaxy always defines the variable (if I read the galaxy code correctly).

That is what I mean, and I think this is a bug. There are tools that does not work with 1 core and we should be able to configure this easily, not with the hack above and not with job_conf.xml (this does not work for testing with planemo easily).

IMO that defeats the purpose of specifying the available cores in the job_conf.xml file.

? ${GALAXY_SLOTS:-2} means if job_conf does not set GALAXY_SLOTS take 2, that's all. This is how we teach it. How does this defeats the purpose of job_conf.xml?

Lets reformulate the issue. I think forcing people to use

# EGSEA requires at least 2 threads
SLOTS=${GALAXY_SLOTS:-2};
[ "$SLOTS" -eq 1 ] && SLOTS=2;
...
--threads $SLOTS

is not very usable. Is there any reason why GALAXY_SLOTS is actively set to 1 instead of using the value that is set in the tools.

mvdbeek commented 6 years ago

? ${GALAXY_SLOTS:-2} means if job_conf does not set GALAXY_SLOTS take 2, that's all.

yeah, that's exactly what happens now, isn't it ? What you're suggesting is a convenient syntax to bypass what the admin has set in the job_conf.xml file. To me that defeats the purpose of being able to specify GALAXY_SLOTS in the first place.

Why don't we remove GALAXY_SLOTS from the default local job runner ? I guess that'd work for you, right ?

nsoranzo commented 6 years ago

The purpose of GALAXY_SLOTS is having a way to communicate to the tool how many cores Galaxy has reserved for the job. The tool author can then decide for example to use 2 threads per available core (but obviously most of the time is the same number).

So what has been thought is a simplification, for the few times we need the above workaround that's not very difficult to implement.

jmchilton commented 6 years ago

Why don't we remove GALAXY_SLOTS from the default local job runner ? I guess that'd work for you, right ?

This would break anyone who is using GALAXY_SLOTS without a non-Galaxy default specified. I'd suspect there are many such tools out there that do that.

@nsoranzo stated this above - but to restate my opinion I think there is a confusion if people think this is a bug - the default isn't declaring a minimum number of threads in Galaxy it is declaring a default to be used if the runtime environment doesn't specify GALAXY_SLOTS. To put another way - deployers who don't specify a slot count can expect a well behaved tool to use one thread - if we wanted to re-interpret the tool syntax to be respect it as a "tool author" default we would be ignoring the "deployer" default.

That said - if the deployer doesn't specify a number of slots explicitly GALAXY_SLOTS_CONFIGURED is unset and it is always set if something has explicitly set a number of threads. It could be used to specify a minimum in this case and have the local runner use two threads is local slots isn't set. (I'm not sure I think it is the right choice to use - but I exposed it because I guess I saw this debate coming?)

If we want to specify a syntax for minimum number of cores that is great - we should definitely do that - implementing CWL properly will require the runtime support this operation anyway. Then the local runner could default to setting the minimum for a give tool - that would be great.

mblue9 commented 6 years ago

Thanks all for all the info on this!

What shouId I do now? Do I wait for this PR https://github.com/galaxyproject/galaxy/pull/5378

bgruening commented 6 years ago

? ${GALAXY_SLOTS:-2} means if job_conf does not set GALAXY_SLOTS take 2, that's all. yeah, that's exactly what happens now, isn't it ?

No, unfortunately not. We can not test tools that need two cores as a minimum.

So what has been thought is a simplification, for the few times we need the above workaround that's not very difficult to implement.

Its not a simplification ;) simple would be to tell people use GALXAXY_SLOTS and not ${GALAXY_SLOTS:-2}. It has no effect at all.

This would break anyone who is using GALAXY_SLOTS without a non-Galaxy default specified. I'd suspect there are many such tools out there that do that.

Why, wouldn't this using the bash default that is given (tool-devs-choice). That's why we use the more complicated syntax and tought it.

@nsoranzo stated this above - but to restate my opinion I think there is a confusion if people think this is a bug - the default isn't declaring a minimum number of threads in Galaxy it is declaring a default to be used if the runtime environment doesn't specify GALAXY_SLOTS.

Until here I can follow. I just think Galaxy should not specify a run time environment default, at least not hard-coded, mabye in the job_conf.

To put another way - deployers who don't specify a slot count can expect a well behaved tool to use one thread

Why "one" and not the tools authors default? @nsoranzo hack will work around the Galaxy deployers default its just more ugly ;)

if we wanted to re-interpret the tool syntax to be respect it as a "tool author" default we would be ignoring the "deployer" default.

No because the "tool authors" default will only be used if the deployer does not specify any default, aka GALAXY_SLOT is unset. As soon as the deployer actually sets anything it will overwrite the tool authors choice.

bgruening commented 6 years ago

@mblue9 it seems everyone seem to be happy if you just hack around this problem :) So hack around it! :)

mblue9 commented 6 years ago

Ok I added that hack and now it's green https://github.com/galaxyproject/tools-iuc/pull/1686 Yay!! 🎉 😄 Thanks all!!!