This adds a new flag METADATA_CACHE_MAX_BYTES which allows the catalog to store table metadata in the metastore and vend it from there when loadTable is called.
Entries are cached based on the metadata location. Currently, the entire metadata.json content is cached.
Features not included in this PR:
Support for updating the cache when a table is updated
Support for invalidating cache entries in the background, rather than waiting for loadTable to be called
Structured storage for table metadata
There is partial support for (1) here and I want to extend it, but the goal is to structure things in a way that will allow us to implement (2) and (3) in the future as well.
Type of change
Please delete options that are not relevant.
[ ] Bug fix (non-breaking change which fixes an issue)
[ ] Documentation update
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[ ] This change requires a documentation update
How Has This Been Tested?
Existing tests vend table metadata correctly when caching is enabled.
Added a small test in BasePolarisCatalogTest to cover the basic semantics of caching
Manual testing with eclipselink -- I observed the entities getting created in Postgres and saw large metadata being cached:
db=# select length(internalproperties), substring(internalproperties, 1, 1000) from entities where id = 152;
...
768691 | {"metadata_location":"file:/tmp/quickstart/ns/tn1731005976265/metadata/00000-e77a2576-5efa-4b7a-b948-121813d713f8.metadata.json","content":"{\"format ...
With MySQL, small metadata is persisted:
mysql> SELECT length(internalproperties), substring(internalproperties, 1, 1000) FROM ENTITIES WHERE id = (SELECT MAX(id) FROM ENTITIES WHERE typecode = 10);
. . .
8159 | {"metadata_location":"file:/tmp/quickstart/ns/t2/metadata/00000-64f975bd-c3a8-4069-bb56-f282003e9157.metadata.json","content":"{\"format-version\"
However large metadata may cause internalproperties to exceed the size limit and nothing will be cached. Calls still return safely.
Description
This adds a new flag
METADATA_CACHE_MAX_BYTES
which allows the catalog to store table metadata in the metastore and vend it from there when loadTable is called.Entries are cached based on the metadata location. Currently, the entire metadata.json content is cached.
Features not included in this PR:
There is partial support for (1) here and I want to extend it, but the goal is to structure things in a way that will allow us to implement (2) and (3) in the future as well.
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Existing tests vend table metadata correctly when caching is enabled.
Added a small test in
BasePolarisCatalogTest
to cover the basic semantics of cachingManual testing with eclipselink -- I observed the entities getting created in Postgres and saw large metadata being cached:
With MySQL, small metadata is persisted:
However large metadata may cause
internalproperties
to exceed the size limit and nothing will be cached. Calls still return safely.