delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
6.98k stars 1.6k forks source link

Adding support for Oracle Cloud Infrastructure (OCI) Object Store as Delta Storage #468

Closed vibhaska closed 3 years ago

vibhaska commented 3 years ago

Adding support for Oracle Cloud Infrastructure (OCI) Object Store as Delta Storage by introducing OCILogStore.

Regarding Storage configuration page in Delta Documentation, I request following changes mentioned here.

pranavanand commented 3 years ago

Thanks for submitting this PR! We are currently coming up with a process for contributing third-party support and will let you know once that's ready.

ksranga commented 3 years ago

Can this be ported to delta 0.6.x version so its available for spark 2.x as well?

vibhaska commented 3 years ago

@ksranga Yes.

I guess only one line change is required: Remove line number 268 from LogStoreSuite.scala. If you want I can raise another PR for old branch after testing and verification.

vibhaska commented 3 years ago

Thanks for submitting this PR! We are currently coming up with a process for contributing third-party support and will let you know once that's ready.

@pranavanand can you please share any further update.

ddraj commented 3 years ago

Echoing @vibhaska's question .. Hi @pranavanand, any chance the PR can be merged? Could you please let us know if anything is pending on the PR that we should address?

tdas commented 3 years ago

Hey @ddraj

Our apologies for sitting on the PR for so long. We wanted to figure out what is the best we integrate slightly less battle-tested features into our repo, and we finally figured it out. We are currently in the process of converting our repo to a multi-module SBT project so that we can add other sub-modules (mapping to other maven release artifacts beyond delta-core). Then we can add a contribs module and then accept such new and experimental features.

Are you still interested in contributing to this feature?

ddraj commented 3 years ago

Hey @ddraj

Our apologies for sitting on the PR for so long. We wanted to figure out what is the best we integrate slightly less battle-tested features into our repo, and we finally figured it out. We are currently in the process of converting our repo to a multi-module SBT project so that we can add other sub-modules (mapping to other maven release artifacts beyond delta-core). Then we can add a contribs module and then accept such new and experimental features.

Are you still interested in contributing to this feature?

Thanks @tdas. Yes we are still interested. Please let us know if we need to do anything specific.

tdas commented 3 years ago

So the refactor to multi-module has been merged. Lets' work towards getting this in. Here are the next steps.

  1. Please define a new module using the same pattern as the core module in the build.sbt file.

    • Does the IBM support need additional maven dependencies? I am guessing not. In that case, we can make the module name generic like "delta-contribs" (that is, not specific to IBM). Other such zero-dependency contributions can go in that project as well. Otherwise, lets put it in a specifically named module like "delta-contribs-ibm"
  2. Move the new log store code inside the <new-module-dir>/src/main/<package>/... and the test code in a separate file in <new-module-dir>/src/test/<package>/....

Let me know if you need help to get it working.

vibhaska commented 3 years ago

@tdas I have updated the changes as suggested. Kindly check if this is what you wanted.

Couple of points where I need some guidance:

  1. Will there be any root project, and jars from same will be published? Or will a separate jars be published from this module. I am not clear how.
  2. I am also not clear why CI failed. Is this due to multi module nature?
tdas commented 3 years ago

These are good questions. My apologies for the delay, we have been trying to line up a few more features like a new, more stable LogStore API that LogStore builders like you folks can use. However, there are a few more pieces that we still need to get in for the public API to be easily usable. I dont want you folks to block on those. So we can continue working using the existing LogStore API, which we will continue supporting for at least the 1.x releases. Later we can rewrite the OCILogStore to use the new API, we are not going to worry about the 1.0 release we are targeting in a couple of weeks.

Now back to the question. We will be releasing a separate Maven artifact for each subproject. The refactoring you have done in the sbt looks good at a high-level. I will leave more detailed comments on them soon. I will also take a look at why the tests failed.

tdas commented 3 years ago

Somehow the sbt download failed. Could be a random error. Can you run the tests once again. An easy to rerun tests is to simply push an empty commit to this branch.

tdas commented 3 years ago

The community has already created the contrib submodule! So all you have to do merge with the master and add your LogStore to the existing directories. No need to fix anything in the build.sbt!

tdas commented 3 years ago

@vibhaska @ddraj if you have time to update the PR, please do so quickly as we preparing to release 1.0 soon. Alternatively, if you are okay, I can do the slight refactoring as well (you still get commit authorship).

ddraj commented 3 years ago

@tdas coincidentally @vibhaska is on vacation this week and the next. So we are not able to update the PR :( I'll happily take your offer for the slight updates. Thanks a lot @tdas

tdas commented 3 years ago

@ddraj all right! I will do it. Do you mind if I change the name OracleCILogstore to make it more obvious which cloud it is? Similar to our naming of IBMCOSLogStore and AzureLogStore.

vibhaska commented 3 years ago

Sorry for the delayed response. I have checked-in refactoring changes. I have also updated the file name, as suggested.

@tdas Thanks for offering help with the changes. Kindly review and please suggest if any more changes are required.

@ddraj Thanks for your support.

tdas commented 3 years ago

@vibhaska thank you for the update. One minor correction I suggest is to rename OracleOCI to just OracleCI because the O in OCI is redundant with Oracle. I can do rename super easily if that's okay with you.

ddraj commented 3 years ago

@tdas any chance we can name it as OracleCloudLogStore. I think that would fully convey the purpose of this implementation.

tdas commented 3 years ago

Sure. As long as there won't be any other variation of LogStore needed on the Oracle Cloud, maybe for any other storage product in Oracle Cloud.

ddraj commented 3 years ago

Don’t think that’s in the offing anytime soon @tdas Lets rename it to OracleCloudLogStore

tdas commented 3 years ago

Alright!

vibhaska commented 3 years ago

@tdas @ddraj Updated the file name, as suggested.

vibhaska commented 3 years ago

@tdas Kindly suggest if any further changes are required.

ddraj commented 3 years ago

@tdas echoing what @vibhaska asked. Is there anything you need from us?

ddraj commented 3 years ago

Hi @tdas gentle ping :)