ga4gh / task-execution-schemas

Apache License 2.0
80 stars 28 forks source link

Request for clarification: Ports field #63

Closed mbookman closed 6 years ago

mbookman commented 7 years ago

The task_execution.proto contains the ability to map ports from the docker container to the host. It is unclear the purpose of this feature. Can that be expanded on in the proto?

Thanks.

geoffjentry commented 7 years ago

If this is what I'm thinking of, @kellrott was using it for jupyter notebooks

buchanae commented 6 years ago

@mbookman Sorry, not sure you got a clear answer here. Basically, this allows services to communicate with the task over the network. The original experiment was that a jupyter notebook server could use a TES service to spawn new notebooks, taking advantage of the nice compute request API we've developed here.

There's possibly a case where this breaks down: in a hypothetical TES service backed by a Kubernetes cluster, I don't think you're allowed to request specific host ports, but this spec allows the task to request specific ports and doesn't clearly define what happens when those exact can't be allocated.

I think the motivation to support basic networking seems good, but maybe the details need some refinement.

delagoya commented 6 years ago

As I've mentioned on previous calls, I am strongly opposed to ports being returned or specified in TES. IMHO this convenience is feature creep from a "execute this unit of work" service to a generalized "launch a containerized service for me please" service.

buchanae commented 6 years ago

96 is a PR to remove ports.

buchanae commented 6 years ago

@kellrott @adamstruck What do you think? Turns out the implementation details are not trivial, and being online for communication conflicts with the batch processing scope here.

geoffjentry commented 6 years ago

Chiming in to say that I agree with @delagoya

adamstruck commented 6 years ago

We have not found any practical use for it beyond the jupyter notebook spawning example. I am fine with removing it.

buchanae commented 6 years ago

Sounds pretty clear. We'll merge the PR next week.

kellrott commented 6 years ago

I'm fine with removing it as well

buchanae commented 6 years ago

In #96 I also removed ExecutorLog.host_ip, which I forgot to do last week. I think this is expected, but please give me a thumbs up/down so I know everyone agrees. Thanks.

Rationale is the same, it's not trivial to give a network accessible host address, it was added when ports were, and this information can be included in system logs if desired.

uniqueg commented 2 years ago

This was now asked for by the Galaxy community now as a critical requirement, see https://docs.google.com/spreadsheets/d/1vBFhBQ-nFqhSL5dLjQfOWO6x9BzmV9x6l18p9GYRZdQ/edit#gid=0

I will try to find out who exactly asked this and ask them to chime into this (old) discussion with some more details. But perhaps you would consider re-opening the issue, @kellrott?

kellrott commented 2 years ago

The biggest argument for not including was that it would be too much effort for teams implementing it. Now that there are three different implementations being built, it would be worth asking those teams. @MattMcL4475 do you have any thoughts?

MattMcL4475 commented 2 years ago

It's certainly technically possible to implement this feature in TES within Cromwell on Azure, likely by using this: https://docs.microsoft.com/en-us/rest/api/batchservice/pool/add#inboundnatpool

That said, I would generally agree with @delagoya . I'd be curious to better understand the Galaxy use case and also see the workflow definition that would necessitate this feature.

kellrott commented 2 years ago

The Galaxy use case centers on their Interactive Tools : https://training.galaxyproject.org/training-material/topics/admin/tutorials/interactive-tools/tutorial.html

The other Galaxy job runners can support deploying interactive tools, but a TES based one wouldn't.