Closed roeap closed 12 months ago
Thank you for your great work, Robert. I would be happy to review this PR if you would like me to.
First this PR changes the data returned for schemas / tables to be in line with the protocol. I also believe there was an inversion in the dependencies, where the schemas table contained a reference to table_id, whereas tables are defined within a schema (
<share>.<schema>.<table>
). To account for this, we drop the table_id column from the schema table, and add a schema_id column to the table table.
Regarding the table_id
column of the schema
table, I implemented this way so that each tables can belong to multiple schemas:
Accordingly, we drop the admin/tables routes, and introduce a new POST admin/shares/:share/schemas route, which handles creating schemas.
Yes, I believe this change impacts the management flow of shares, schemas, and tables. In my opinion (while I was implementing the APIs), I thought that a schema cannot exist on its own but rather functions as a pure relationship between a share record and a table. Anyway, I don't have any strong opinions on this implementation design, I would be happy if you provide me your opinion.
By the way, I apologize for not being able to commit to or assist with your project in recent days. I will certainly contribute once my schedule allows me to do so.
@ognis1205 - regarding the schema / table thing ...
In my opinion, it would be advantageous - i.e. the simplest and most consistent - if we just follow the thee layer hierarchy. While you are right, that an empty schema does not really give you much value, its mostly just layers or organization. But in the end, so are shares.
Most database systems I know, follow that approach, just for some reason the names are the root differ all the time :D. e.g. in postgres it would be <database>.<schema>.<table>
, in unitiy catalog they call it <catalog>.<schema>.<table>
, and so on :).
There is a difference that you pointed out correctly, in the other systems I mentioned, a table will always belong to exactly one schema, where here (shares being the main authorization scope), we may see the same table in multiple shares. However the table entities itself are very cheap, i.e. just a name and location. The actual expensive part, is reading the metadata. For this I belive you already mentioned a good solution in #7 - if I understand that right. essentially we should make sure that we have some form of caching or catalog to efficiently handle table metadata. One potential integration would be unity catalog itself, where we can get metadata very cheap - of course we would likely need to make this pluggable.
Let me know what you think :).
@roeap
There is a difference that you pointed out correctly, in the other systems I mentioned, a table will always belong to exactly one schema, where here (shares being the main authorization scope), we may see the same table in multiple shares. However the table entities itself are very cheap, i.e. just a name and location. The actual expensive part, is reading the metadata. For this I belive you already mentioned a good solution in https://github.com/delta-incubator/delta-sharing-rs/issues/7 - if I understand that right. essentially we should make sure that we have some form of caching or catalog to efficiently handle table metadata. One potential integration would be unity catalog itself, where we can get metadata very cheap - of course we would likely need to make this pluggable.
I agreed with you. Thank you for the clarification :) LGTM.
@ognis1205 @rtyler - i have some more follow up work planned, any chance we can get this merged? 😄
@ognis1205 @rtyler friendly ping.
@roeap Apologies for my extreme laziness, Robert. I am currently working on a project related to delta-sharing-rs and will share my idea on the Slack channel in the near future, along with a mockup. I have to prepare the OSSJ presentation, so I promise to return to the Slack channel before the event.
@ognis1205 - that's great news.
Would this conflict with the changes in this and the 2 follow up PRs? I think it is important to align the data model with unit catalog and other systems. The follow ups just fix some things in the generated openapi spec and enable testing.
@roeap
Would this conflict with the changes in this and the 2 follow up PRs?
No, it is absolutely fine with this and the subsequent two PRs.
I think it is important to align the data model with unit catalog and other systems.
Regarding the data model, thank you for the suggestion, and I wholeheartedly agree with you. The data model should align with conventional catalog systems.
@ognis1205 - do you mind approving this PR then? 😆.
@roeap LGTM. No problem at all. Thank you for asking, Robert!
Apologies in advance as this PR looks quite massive ... most of the changes are related to
cargo fmt
andcargo clippy --fix
though, so I hope its no too bad.There are some more significant changes as well which I hope make the implementation more in line with sharing protocol spec. First this PR changes the data returned for schemas / tables to be in line with the protocol. I also believe there was an inversion in the dependencies, where the schemas table contained a reference to
table_id
, whereas tables are defined within a schema (<share>.<schema>.<table>
). To account for this, we drop thetable_id
column from theschema
table, and add aschema_id
column to thetable
table.Accordingly, we drop the
admin/tables
routes, and introduce a newPOST admin/shares/:share/schemas
route, which handles creating schemas.also working on some follow ups, to enable fmt, clippy, and tests in CI ...