Open wlandau opened 4 years ago
I'm all supportive for this - AWS Lambda and AWS Batch been on my radar for a while. My hope was that there would be a low-level R API that could be leveraged for this. There have been different efforts on AWS Lambda but I don't they've taken off.
Should we have another call on this? It'll help me clarify a few things related to the future roadmap.
Awesome! I would love to chat about this, and I can definitely make time after R/Pharma (Oct 13-15).
paws
is ostensibly capable of setting up the web API calls to submit jobs to Batch (https://github.com/paws-r/paws/blob/main/examples/batch.R). However, I am not sure how to communicate with Batch workers. I could easily see that as enough motivation for a new R API.
From https://github.com/mschubert/clustermq/issues/208#issuecomment-725444690, it seems possible for clustermq
to support an AWS backend (Batch or similar), and then future
could interact with it through future.clustermq
.
Should we have another call on this? It'll help me clarify a few things related to the future roadmap.
I would be happy to arrange something on Google Meet for us and @mschubert. Does that still sound good?
Should I open a separate issue for Lambda? I think we agreed this may be easier to start with, especially with @davidkretch's nice demo.
I've created https://github.com/HenrikBengtsson/future.lambda with the goal of implementing support for plan(future.lambda::lambda)
.
Fantastic! Eager to try when it is ready. (Currently working on getting access to my company's AWS resources.)
I propose AWS Batch as a new
clustermq
scheduler. Batch has become extremely popular, especially as traditional HPC is waning. I have a strong personal interest in making Batch integrate nicely with R (ref: https://github.com/wlandau/targets/issues/152, https://github.com/wlandau/tarchetypes/issues/8, https://wlandau.github.io/targets-manual/cloud.html).Batch is super easy to set up through the AWS web console, and I think it would fit nicely into
future
's ecosystem: maybe with something likefuture::plan(future.aws.batch::future_aws_batch, template = "batch.tmpl")
, wherebatch.tmpl
contains an AWS API call with the compute environment, job queue, job definition, and key pair. I think we could usecurl
directly instead of the much larger and rapidly developingpaws
package. The tricky part is how we retrieve the data back from an AWS Batch job. I'm not sure how to do that yet.