feat: update creating schemas and register tables as schema children

roeap commented 1 year ago

Apologies in advance as this PR looks quite massive ... most of the changes are related to cargo fmt and cargo clippy --fix though, so I hope its no too bad.

There are some more significant changes as well which I hope make the implementation more in line with sharing protocol spec. First this PR changes the data returned for schemas / tables to be in line with the protocol. I also believe there was an inversion in the dependencies, where the schemas table contained a reference to table_id, whereas tables are defined within a schema (<share>.<schema>.<table>). To account for this, we drop the table_id column from the schema table, and add a schema_id column to the table table.

Accordingly, we drop the admin/tables routes, and introduce a new POST admin/shares/:share/schemas route, which handles creating schemas.

also working on some follow ups, to enable fmt, clippy, and tests in CI ...

ognis1205 commented 1 year ago

Thank you for your great work, Robert. I would be happy to review this PR if you would like me to.

ognis1205 commented 1 year ago

First this PR changes the data returned for schemas / tables to be in line with the protocol. I also believe there was an inversion in the dependencies, where the schemas table contained a reference to table_id, whereas tables are defined within a schema (<share>.<schema>.<table>). To account for this, we drop the table_id column from the schema table, and add a schema_id column to the table table.

Regarding the table_id column of the schema table, I implemented this way so that each tables can belong to multiple schemas:

ont_to_many one_to_one

Accordingly, we drop the admin/tables routes, and introduce a new POST admin/shares/:share/schemas route, which handles creating schemas.

Yes, I believe this change impacts the management flow of shares, schemas, and tables. In my opinion (while I was implementing the APIs), I thought that a schema cannot exist on its own but rather functions as a pure relationship between a share record and a table. Anyway, I don't have any strong opinions on this implementation design, I would be happy if you provide me your opinion.

By the way, I apologize for not being able to commit to or assist with your project in recent days. I will certainly contribute once my schedule allows me to do so.

roeap commented 1 year ago

@ognis1205 - regarding the schema / table thing ...

In my opinion, it would be advantageous - i.e. the simplest and most consistent - if we just follow the thee layer hierarchy. While you are right, that an empty schema does not really give you much value, its mostly just layers or organization. But in the end, so are shares.

Most database systems I know, follow that approach, just for some reason the names are the root differ all the time :D. e.g. in postgres it would be <database>.<schema>.<table>, in unitiy catalog they call it <catalog>.<schema>.<table>, and so on :).

There is a difference that you pointed out correctly, in the other systems I mentioned, a table will always belong to exactly one schema, where here (shares being the main authorization scope), we may see the same table in multiple shares. However the table entities itself are very cheap, i.e. just a name and location. The actual expensive part, is reading the metadata. For this I belive you already mentioned a good solution in #7 - if I understand that right. essentially we should make sure that we have some form of caching or catalog to efficiently handle table metadata. One potential integration would be unity catalog itself, where we can get metadata very cheap - of course we would likely need to make this pluggable.

Let me know what you think :).

ognis1205 commented 1 year ago

@roeap

There is a difference that you pointed out correctly, in the other systems I mentioned, a table will always belong to exactly one schema, where here (shares being the main authorization scope), we may see the same table in multiple shares. However the table entities itself are very cheap, i.e. just a name and location. The actual expensive part, is reading the metadata. For this I belive you already mentioned a good solution in https://github.com/delta-incubator/delta-sharing-rs/issues/7 - if I understand that right. essentially we should make sure that we have some form of caching or catalog to efficiently handle table metadata. One potential integration would be unity catalog itself, where we can get metadata very cheap - of course we would likely need to make this pluggable.

I agreed with you. Thank you for the clarification :) LGTM.

roeap commented 1 year ago

@ognis1205 @rtyler - i have some more follow up work planned, any chance we can get this merged? 😄

roeap commented 1 year ago

@ognis1205 @rtyler friendly ping.

ognis1205 commented 1 year ago

@roeap Apologies for my extreme laziness, Robert. I am currently working on a project related to delta-sharing-rs and will share my idea on the Slack channel in the near future, along with a mockup. I have to prepare the OSSJ presentation, so I promise to return to the Slack channel before the event.

roeap commented 1 year ago

@ognis1205 - that's great news.

Would this conflict with the changes in this and the 2 follow up PRs? I think it is important to align the data model with unit catalog and other systems. The follow ups just fix some things in the generated openapi spec and enable testing.

ognis1205 commented 1 year ago

@roeap

Would this conflict with the changes in this and the 2 follow up PRs?

No, it is absolutely fine with this and the subsequent two PRs.

I think it is important to align the data model with unit catalog and other systems.

Regarding the data model, thank you for the suggestion, and I wholeheartedly agree with you. The data model should align with conventional catalog systems.

roeap commented 1 year ago

@ognis1205 - do you mind approving this PR then? 😆.

ognis1205 commented 1 year ago

@roeap LGTM. No problem at all. Thank you for asking, Robert!

delta-incubator / delta-sharing-rs

feat: update creating schemas and register tables as schema children #13