This is a PR for parallel query execution in codegen.
Supported parallel operators:
[x] Sequential scans
[x] Sorting
[x] Hash joins
[ ] Aggregations
[ ] Inserts (purposely disabled until we have COPY)
[ ] Updates
[ ] Deletes
Parallelism is done in fork-join manner.
Settings added to toggle parallel execution, and set the minimum table size threshold before parallel execution kicks in. This logic should be modified.
@chenboy Check plan_generator to see if the logic looks good.
I've split up RuntimeState into QueryState and PipelineState.
QueryState exists for the lifetime of the query. It is initialized in the init() function, and torn down in the tearDown() function.
PipelineState exists for only one pipeline. It is created on entry to the pipeline function, and is torn down upon exit.
I've cleaned up proxies to simplify loading/store struct member variables using names rather than index positions.
What used to be codegen.CreateGEP(..., 0, 2) to load struct elements is now codegen.Load(HashTableProxy::directory, ...).
Similarly done for storing elements of a struct: codegen.Store(...)
Removed CCHashTable, added generic HashTable that will eventually be used for both joins and aggregations.
Review notes:
Most of the heavy lifting is done in pipeline.cpp. Pipelines can be run serially or in parallel.
This means all operators need to pass a std::function when launching pipelines.
Many of the changed files are in the proxies, making is simpler to create them.
TODO:
- Remove std::mutex from buffer output (This will be another PR)
- Enable parallel aggregations (This will be another PR when parallel agg team is done)
- Add tests for memory leak (Done)
Coverage decreased (-0.2%) to 77.406% when pulling 1a70093a0eeccbe50edd2a92ad17c066c97a4a34 on pmenon:mt into 65915234edf5c33acda078f632012887291d22b7 on cmu-db:master.
This is a PR for parallel query execution in codegen.
plan_generator
to see if the logic looks good.RuntimeState
intoQueryState
andPipelineState
.QueryState
exists for the lifetime of the query. It is initialized in theinit()
function, and torn down in thetearDown()
function.PipelineState
exists for only one pipeline. It is created on entry to the pipeline function, and is torn down upon exit.codegen.CreateGEP(..., 0, 2)
to load struct elements is nowcodegen.Load(HashTableProxy::directory, ...)
.codegen.Store(...)
CCHashTable
, added genericHashTable
that will eventually be used for both joins and aggregations.Review notes:
pipeline.cpp
. Pipelines can be run serially or in parallel.std::function
when launching pipelines.TODO:
- Remove(This will be another PR)std::mutex
from buffer output- Enable parallel aggregations(This will be another PR when parallel agg team is done)- Add tests for memory leak(Done)