Open rivershah opened 11 months ago
Thanks for the input @rivershah.
A cursory look indicates that Batch supports this via the LocationPolicy:
https://cloud.google.com/batch/docs/reference/rest/v1alpha/projects.locations.jobs#locationpolicy
We'll test it out and look to wire it up if it works as expected.
@mbookman Thanks for looking. My understanding of the docs is that batch
api will raise an error if multiple regions.
Only one region or multiple zones in one region is supported now
It appears the multi-region support may not have been ported over in batch
. Await the results of your experimentation with this as I can't seem to get my jobs to schedule if I specify multiple regions as VM enter error state.
For context, why the multi-region support is so useful is that greatly simplifies job submission for hard to find resources such as high memory nodes and GPU accelerators. It is typical that a large parallel GPU dsub tsv submission will find resources across geographically widely separated regions
Does dsub work with google batch?
@mbookman Happy new year! I am still pretty sure that batch
as implemented on google's side, does not support submitting a job to us
wide regions, which google-cls-v2
does allow. This would be a major feature regression. I don't think this a dsub
limitation.
Can you please verify if I what I am saying is correct. If so, we will need help determine if this feature can be implemented in batch
@mbookman @wnojopra As the google-cls-v2
is headed for removal soon enough, requesting that we look at this feature regression. Thank you
Hey @rivershah !
Sorry about the delay in following up. We did check in with the Batch team regarding this. The lack of the multi-region support is presently intentional in the sense that it was not considered to have high utility. It would be great if we could get more input from you on your use case and where you see it giving value.
One of the key drivers of this feature not being added to Batch is the change in Cloud pricing in 2022 where accessing data from multi-region buckets to regional buckets became something that incurs Data Transfer Out charges (fka egress charges).
https://cloud.google.com/storage/pricing-announce#network
Reading data in a Cloud Storage bucket located in a multi-region from a Google Cloud service located in a region on the same continent will no longer be free; instead, such moves will be priced the same as general data moves between different locations on the same continent.
Northern America Northern America $0.02/GB
Prior to those pricing changes, access to data in US multi-region bucket to any of the US regions was free. So the Cloud view on this is that generally people will want to use regional buckets and regional VMs.
So can you share your use case where this pricing change has not impacted you and where you'd get high value from multiple regions?
Thanks.
Hi @mbookman,
Apologies for the delay. The multi-region feature is crucial for several reasons:
google-cls-v2
has powerful multi region support through wild card matching or providing lists of regions. It appears thatgoogle-batch
is lacking this feature.Multi-region support is a much loved and used feature with
google-cls-v2
. Can you please verify that indeedgoogle-batch
does not have this. And if not, would it be possible to work with google batch developers to introduce this feature by the time Cloud Life Sciences gets removed. Thank you.