Open okofish opened 3 years ago
indeed.. would be nice to be able to specify --platform-capabilities EXTERNAL, e.g.,
aws batch register-job-definition --job-definition-name sleep30 --type container --container-properties '{ "image": "busybox", "vcpus": 1, "memory": 128, "command": [ "sleep", "30"]}' --platform-capabilities EXTERNAL
--platform-capabilities (list)
The platform capabilities required by the job definition. If no
value is specified, it defaults to EC2 . To run the job on Fargate
resources, specify FARGATE .
(string)
Syntax:
"string" "string" ...
Where valid values are:
EC2
FARGATE
tried to sneak it past using --cli-input-json .. no luck =(
rrizun@rrizuns-MacBook-Air farspot % cat newjob.json
{
"jobDefinitionName": "sleep30",
"type": "container",
"containerProperties": {
"image": "busybox",
"vcpus": 1,
"memory": 1024,
"command": [
"sleep",
"30"
]
},
"platformCapabilities": [
"EXTERNAL"
]
}
rrizun@rrizuns-MacBook-Air farspot % aws batch register-job-definition --cli-input-json file://newjob.json
An error occurred (ClientException) when calling the RegisterJobDefinition operation: Error executing request, Exception : Capability EXTERNAL is not valid. Valid capabilities: [FARGATE, EC2], RequestId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
I'm fairly deep into a project where I assumed this would be possible. I think the existence of this thread means we're officially at a dead end. We were hoping to run the Metaflow ML model training framework on our own GPU-machines.
Is there anything that can be done to help prioritize this?
With GPUs in AWS being in such high demand, having the option to use our on-prem GPU clusters in AWS Batch would be incredibly helpful. If Batch Job Definitions supported the EXTERNAL
option, we could easily switch some of our jobs to on-prem GPUs with minimal adjustments. Despite trying several workarounds, none have been successful so far. Notably, using EXTERNAL
works seamlessly with ECS Job Definitions but unfortunately not with Batch, even after three years.
Community Note
Tell us about your request What do you want us to build? I'd like to be able to use ECS Anywhere clusters in unmanaged Batch compute environments
Which service(s) is this request for? AWS Batch and ECS
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Coupling the AWS Batch control plane with on-premises ECS Anywhere instances is a very intriguing model for hybrid cloud and dev/test batch processing workloads. It is currently possible to create an unmanaged compute environment linked to an ECS cluster with external instances, but there's no way to tell the Batch control plane to run tasks on the external instances. It can be seen from CloudTrail logs that Batch invokes the RunTask operation using the setting
"launchType": "EC2"
:My understanding is that this needs to be
"launchType": "EXTERNAL"
in order for the task to run on ECS Anywhere instances. It would be desirable to be able to configure Batch compute environments to use theEXTERNAL
launch type.Are you currently working around this issue? I do not currently have a workaround for this issue.
Additional context N/A
Attachments N/A