DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
265 stars 44 forks source link

Support for extended memory on custom machine type #167

Closed vincent-octo closed 5 years ago

vincent-octo commented 5 years ago

Hi,

I have a workload that requires VMs of 1 CPU and 20 GB of RAM.

When using dsub, if I use --provider google-v2 --min-cores 1 --min-ram 20 then it will spawn a custom machine with 4 vCPUs and 20 GB of RAM. Ideally I would like to have it spawn a custom machine with 1 vCPU and 20 GB of RAM using extended memory, as it reduces the cost a bit.

Is there support for extended memory in dsub? If so, how to set it properly on the command line?

mbookman commented 5 years ago

Hi @vincent-octo !

I'm not sure if this is supported by the Pipelines API. If it is, this would be done by specifying a custom machine type as documented here:

https://cloud.google.com/genomics/reference/rest/Shared.Types/Metadata#VirtualMachine

The machine type of the virtual machine to create. Must be the short name of a standard machine type (such as "n1-standard-1") or a custom machine type (such as "custom-1-4096", where "1" indicates the number of vCPUs and "4096" indicates the memory in MB).

Please try using the --machine-type flag with a custom value. If this does not work, then we will need to pass a feature request on to the Cloud Health team.

vincent-octo commented 5 years ago

Hi @mbookman and thanks for your reply.

I didn't find a mention of extended memory in the documentation you provided. However, the GCP pricing calculator names this instance type n1-custom-1-20480-extended.

I tried it with dsub but it results in the following error:

dsub --provider google-v2 ... --machine-type n1-custom-1-20480-extended --command 'echo "Hello "' --wait

Job: dsub-xxx--xxx--190830-135045-31 ... googleapiclient.errors.HttpError: <HttpError 400 when requesting https://genomics.googleapis.com/v2alpha1/pipelines:run?alt=json returned "Error: validating pipeline: invalid machine type: unknown machine type">

Like you said, the API doesn't seem to support extended memory. So I will use custom VMs withtout extended memory (the price difference is not that big). Feel free to close this issue.

mbookman commented 5 years ago

Thank-you for testing it out. I have filed a feature request to have support in the Pipelines API. Will leave this issue open for tracking.

mbookman commented 5 years ago

Hi @vincent-octo .

The Pipelines API team has added initial support for custom machine types. For N1 machine types, you can use the notation:

--machine-type custom-CORES-MEMORY-ext

Support for the N2 machine types has not been added yet.

vincent-octo commented 5 years ago

Hi @mbookman ,

The custom VM with extended memory seems to be working from my initial test.

Just a note: the memory should be put in MiB and be a multiple of 256MiB. Otherwise it will result in the following error:

FAILURE   Execution failed: creating instance: inserting instance: Invalid value for field 'resource.machineType': 'zones/europe-west1-c/machineTypes/custom-1-20-ext'.
Memory should be a multiple of 256MiB, while 20MiB is requested. Memory size for 1 vCPU instance should be between 1024MiB and 6656MiB, while 20MiB is requested

Thanks for your help.