Open universalmind303 opened 1 year ago
Quick table on where we should be doing certain actions:
Table/Function | Resolve | Dispatch | Execute |
---|---|---|---|
read_* (read databases) |
remote | remote | remote |
*_scan (s3/gcs/http) |
remote | remote | remote |
*_scan (local) |
local | local | local |
Native tables | local | remote | remote |
External tables | local | remote | remote |
Tables using an external database | remote | remote | remote |
Temp tables | local | local | local |
CREATE EXTERNAL TABLE ...
CREATE EXTERNAL DATABASE ...
Notes:
Readable explains for user + debugging.
We can move it to next
to improve upon the intial RPC impl.
So i think this is blocked as long as we are using the references for our remote execs. Without (de)serializing the inner plan, the remote exec nodes are just black-boxes.
Explain Execs
TL;DR, we need to think about the plans in a way that they are easy to explain when running
EXPLAIN
. They should be self describing & contain all relevant information about remote/local execution.I initially was working on Explain for RPC & it brought up some questions on how we can properly display this information when running an explain such as
I think the logicalplan explain makes sense (for the most part). We could maybe clarify by wrapping this in something like
RemoteTableScan
.I think we need further clarity on the physical plan execs. such as
which could be flattened to:
Hybrid execution Execs
In order to properly
EXPLAIN
all of these paths, we need a very clear understanding of the hybrid execution model & the resulting execs.I know we have the following
SendRecvJoinExec
ClientExchangeSendExec
ClientExchangeRecvExec
RemoteExecutionExec
RemoteScanExec
but not sure I fully understand how all of those are being used.
My understanding of the ideal execution for various execs
🏠 = local 📡 = remote
Single source execs
binary join execs (we can skip this section & just rely on the multi join logic for now)
we could be slightly more sophisticated when doing a non nested join.
Some assumptions
Multi join execs
this is where things get really tricky & probably best to keep this portion rather simple for now & create some optimization rules later on