apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.22k stars 2.17k forks source link

Questions on some requirement in view spec #10410

Open ajantha-bhat opened 4 months ago

ajantha-bhat commented 4 months ago

Query engine

NA

Question

  1. Should the summary be optional? https://iceberg.apache.org/view-spec/#versions

Because the contents of summary is not strictly defined and engine-name and engine-version mentioned here is also optional. A required empty map is functionally not different from a non-existent map?

  1. What is the logic behind making default-namespace as required but default-catalog as optional? Also, why these fields needed? Isn't the resolution should happen same as table resolution? For example, if the namespace is not mentioned in the SELECT, engine uses default namespace or throw error if it cannot resolve. Similarly if the catalog name is not mentioned, engine should use the default catalog or throw error if it cannot resolve. We don't define default-namespace and default-catalog for Iceberg table spec. So, we shouldn't have it for view spec as well?

I will work on a PR for updating it once it is confirmed here.

ajantha-bhat commented 4 months ago

cc: @jbonofre, @snazy, @dimas-b

nastra commented 3 months ago

Summary

The summary field was modeled similar to the summary for v2 tables (required). Prior to #8678 the summary carried a required operation field. Making a required field optional now is a bit difficult, since you'd still have to support older clients.

Default namespace/catalog

This info is needed to adhere with Spark's View / ViewCatalog API. Trino also stores the same info about namespace/catalog. You might want to take a look at SparkView and CreateV2ViewExec to see how this info is being used for Spark. For Trino you can take a look at TrinoRestCatalog#getView and #createView

ajantha-bhat commented 3 months ago

@nastra:

  1. Thanks for the info on the summary, I didn't know we had required operation before. Agree that making it optional now is not a good idea.

  2. Trino uses SchemaTableName for getView and createView. So, namespace will always be there. No need to infer from the persisted view? And for catalog name? Isn't it a temporary thing? the user can instantiate the same config of the catalog with another name. So, why to persist something in metadata that is temporary?