Closed poojanilangekar closed 6 years ago
@poojanilangekar Hi, this PR looks good for me except for some minor changes involved and some questions I posted in the comment area. I mainly focus on the logic of catalog & its corresponding test cases and I also take a look at the layout.h/layout.cpp, it stands.
And thank you for fix the typo in DEFAULT_SCHEMA_NAME!
This PR decouples the logical schema from the physical layout of the data. Previously, each TileGroup contained a vector of schemas for which violates the notion of the storage layer being a physical store. We also had cases where the
LayoutTuner
modifies thecolumn_map
which ended up modifying theschemas
vector via theDataTable
. There were multiple other instances where theSchema
was used to get the layout of aTileGroup
while all we needed was the physical mapping between column_ids and a <tile_id, column_offset> pair. This PR now makes it possible to make schema changes like changing column name, adding & dropping columns without making any calls to the storage layer (i.e., It can now be done entirely in the catalog).Changes:
Layout
andLayoutCatalog
classes to store the layout in a physical object and thepg_layout
.default_layotut_
object to each table (replacingdefault_partition
) andtile_group_layout_
to eachTileGroup
.HYBRID
layouts to thepg_catalog
table. The two pre-defined layouts,ROW
andCOLUMN
stores are not persisted.Layout
on a perTileGroup
while invoking theTableScanTranslator
.TileGroup
has aLayoutType::ROW
layout.LayoutTuner
to tune theLayout
rather than thecolumn_map
. Additionally, the tuning decisions are now transactional.pg_catalog
to theSystemsCatalog
to ensure that the each layout is stored in a catalog with respect its database.Testing:
LayoutTuner
test to ensure it works as expected after the refactor.TileGroupLayoutTest
andSeqScanTest
which ensures that the old engine works as expected.TableScanTranslatorTests
to replicate the tests inTileGroupLayoutTest
andSeqScanTest
. The tests check that the LLVM engine can perform scans on tables different layouts and on tables where eachTileGroup
has a differentLayout
.CatalogTests
to ensure that thepg_layout
stores and retrieves data as expected and its state is transactional.Disclaimer:
Tile
still contains aSchema
which will be used by the old engine, theGetValue
andSetValue
functions and (currently) the LLVM engines to determine the stride in case of aHYBRID
layout. I believe we have to continue supporting theGetValue
andSetValue
functions even after the old engine is deprecated because we will be needing these interfaces in theCatalog
and possibly theOptimizer
.LogicalTile
generated by the executor engine, I had to reconstruct theSchemas
. This isn't very efficient but the idea is that it is currently only used in tests and will soon be deprecated.ColumnCatalog
,StatsCatalog
), theLayoutCatalog
directly returns aLayout
rather than aLayoutCatalogObject
since the API are designed to be read only. In essence, the layout is immutable because if you were changing the layout, you need to change the physical representation. So it felt redundant to create a read only object to be returned by the catalog.@mengranwo Can you please review the catalog changes? @pmenon Can you please review the changes to the LLVM engine? Mainly,
RuntimeFunctions
,Inserter
,Updater
,TestingCodegenUtil
andTableScanTranslatorTests
. @pervazea Can you please review the overall changes? I have modified a few APIs in the old engine. The current code doesn't break the existing tests and it seems to function as expected. But please let me know if you think there is anything I need to change. @jarulraj I know you may not have the bandwidth for this, but if possible could you please review the changes toLayoutTuner
andLayoutTunerTest
. Nobody else has modified that code and it would be great if you could take a look.