Open harryprince opened 5 years ago
Hi, this an interesting idea.
It's been a while I dove into the inner parts of sparklyr
, but if it's now possible to "launch" a single expression on Spark, check whether it's done (in a non-blocking way), and then collect the results, "all that is needed" is to implement future()
, resolved()
, and value()
on top of sparklyr and we're home. Then, with then future.tests validator we can make sure it conforms to the core Future API. Then, it'll work everywhere.
EDIT 2020-01-07: Added an important but missing "if" above.
I think you can cooperate with RStudio team, I will propose an another issue on sparklyr repo.
In sparklyr 1.7.7, there is registerDoSpark
to register Spark as a parallel backend for foreach. I was wondering if between that and doFuture there was a path forward for using Spark as a future backend? Perhaps related, I see that in sparklyr::spark_apply()
there appears to be support for barrier execution, which is mentioned in the linked sparklyr issue.
Hi future team, I found future is a great framework for distributed data processing. sparklyr::spark_apply is doing the similar things , which support local mode/ yarn-client/ yarn-cluster mode.
wish to integrate spark to future framework.