Closed milesgranger closed 6 months ago
I don't love that to get this endpoint, you're basically turning off all the security.
Maybe that's fine for now, not sure. People port scan known AWS IP addresses, not sure if anyone cares about exposed Spark clusters.
If 4040 is just a simple HTTP thing, might be worth just adding token auth in front of it.
I don't love that to get this endpoint, you're basically turning off all the security.
I know, help me please. ;-)
It's in draft b/c I'm quite confident I don't want to do it this way. But to get the current number of active executors, it appears using the REST API is the most reliable way. This is draft as I show that it functionally can be done, but it's probably not fully correct in its current standing. :)
ref: spark REST API
I'd want to understand what's on 4040, if we should/can put auth in front of it, etc.
My understanding is that it's a pretty simple HTTP thing that gives information around Spark's runtime state. Putting a token in front of it seems reasonable to me. I think that we should treat it much like how we treat Spark's server running on 8080
Giving it a name, like /spark-master
might also be good (@milesgranger feel free to suggest a better name if you have one)
Changes in commit https://github.com/coiled/benchmarks/pull/1450/commits/feabcc30756b901864eadd71e007012efb728688 depends on https://github.com/coiled/platform/pull/5001 (or something equivalent).
4040 is already exposed at /spark
, so it appears we just need to store the spark_dashboard
variable after the get_spark
call. Of course could repeat the generation logic here, but seems nice to store the link anyway since we store dask dashboard link.
I assume you have tested this manually?
Yes, I can attest to this having been done, thanks! :+1:
Part of #1449