facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.44k stars 1.12k forks source link

Delete code under velox/substrait #8895

Open mbasmanova opened 7 months ago

mbasmanova commented 7 months ago

Bug description

This code is no longer used. Let's delete it to avoid confusing and cycles spent on maintanance.

CC: @kgpai

System information

n/a

Relevant logs

No response

kgpai commented 7 months ago

cc: @rui-mo @FelixYBW

FelixYBW commented 7 months ago

I'm OK to remove them since they are moved in Gluten now. See if Voltron Data guys still wants to work on it.

EpsilonPrime commented 7 months ago

I don't think anyone else can use the version that's inside Gluten. Ideally the Substrait->Velox code would live in its own repository so it can be used by anyone needing a Substrait gateway to Velox (similar to datafusion's substrait offering).

FelixYBW commented 7 months ago

I don't think anyone else can use the version that's inside Gluten. Ideally the Substrait->Velox code would live in its own repository so it can be used by anyone needing a Substrait gateway to Velox (similar to datafusion's substrait offering).

Yes, but the problem now is it's not mature yet and no one is developing or maintain the code. I don't know if anyone still uses it. In future if someone owns the work, they can start from current code.

Gluten team currently has no resource to work on it, it's not our high priority work. If Voltron Data is going to make it mature, Gluten will switch from our internal one to Velox's.

drin commented 7 months ago

Is velox open to someone else maintaining the code in a separate repository as @EpsilonPrime mentioned?

I am researching the use of substrait to span engines such as Acero, DuckDB, etc. I would be willing to be a maintainer for a separate repository for the forseeable future; or, I can coordinate with VoltronData or @EpsilonPrime on some sort of shared maintainership.

drin commented 7 months ago

with a very brief look at the code, it seems like the code doesn't need to be compiled in-tree, but let me know if that's incorrect.

FelixYBW commented 7 months ago

Is velox open to someone else maintaining the code in a separate repository as @EpsilonPrime mentioned?

Definitely. You may contribute if you need. The code in velox is uncompleted yet. You may refer to the one Gluten currently is using: https://github.com/oap-project/gluten/tree/main/cpp/velox/substrait

The substrait we used is a pretty old version, you may upgrade to latest version. Also substrait itself is not completed yet. There should be some PR to add functions.

mbasmanova commented 7 months ago

@drin

Is velox open to someone else maintaining the code in a separate repository as @EpsilonPrime mentioned?

Sure. That works. Thanks.

drin commented 7 months ago

You may refer to the one Gluten currently is using

Thanks for the pointer. Was the substrait code donated to the Gluten codebase? the licenses attribute different organizations.

If Voltron Data is going to make it mature, Gluten will switch from our internal one to Velox's.

Are Gluten developers maintaining the substrait -> velox code? What would be the process for Gluten to switch from an internal implementation to an external implementation? I'm wondering if I can work with Gluten developers to move the code out sooner rather than later.

Probably for both of the above I should just ask Gluten devs anyways, but figured I would ask here in case you know.

FelixYBW commented 7 months ago

Thanks for the pointer. Was the substrait code donated to the Gluten codebase? the licenses attribute different organizations.

No, the substrait support in Velox is added by Gluten team initially. Gluten only holds the codes which are not upstream to velox yet. But later the hold code is more and more so we stopped the upstream work. Your contribution should be to Velox directly. Not gluten.

Are Gluten developers maintaining the substrait -> velox code? What would be the process for Gluten to switch from an internal implementation to an external implementation? I'm wondering if I can work with Gluten developers to move the code out sooner rather than later.

Probably for both of the above I should just ask Gluten devs anyways, but figured I would ask here in case you know.

Gluten developers only maintain the substrait -> Velox code in Gluten repo, not the one in Velox now. You may port the code in Gluten and upstream to Velox. Or you can start from fresh. Either way works. If the conversion in Velox is mature enough, we will remove the code in gluten and switch to Velox one.

drin commented 7 months ago

Your contribution should be to Velox directly.

If the conversion in Velox is mature enough

To clarify: it is preferred that I just help maintain the substrait conversion in the velox repo itself?

I was considering creating a velox-substrait repo and porting most of the CMakeLists.txt from gluten to place in the new repo. Then, the code can be compiled out of the velox source tree and any systems that use velox can leverage that as a submodule or something similar. However, it sounds like this is not the preferred approach?

Would a preferred approach be to somehow facilitate changes upstreamed from gluten into velox?

The reason I ask is that it seems that if gluten is willing to use substrait conversion in the velox repo, but they're maintaining their own version already, what is the barrier to them just migrating the code themselves and working in the velox repo?

I also want to clarify that I am currently a grad student that is not funded by anything gluten or velox related, so if some of the need is for someone to review changes to upstream into velox then I probably would not be able to keep up with that.

mbasmanova commented 7 months ago

@drin

I was considering creating a velox-substrait repo

A separate repo would be preferred. Thanks.

drin commented 7 months ago

thanks for the fast response!

drin commented 7 months ago

I'll try to have https://github.com/drin/velox-substrait/issues/1 finished soon so that you can resolve this issue.

mbasmanova commented 7 months ago

@drin Thanks.

FelixYBW commented 7 months ago

The reason I ask is that it seems that if gluten is willing to use substrait conversion in the velox repo, but they're maintaining their own version already, what is the barrier to them just migrating the code themselves and working in the velox repo?

Because upstreaming to velox still needs much clean up work and take much time. It's not Gluten's high priority work.

If you are going to setup a new repo, let's remove the code in velox now.

FelixYBW commented 6 months ago

@mbasmanova Any possible to put the substrait-Velox repo as a side project of Velox?

mbasmanova commented 6 months ago

@FelixYBW Binwei, what do you mean by "side project"?

FelixYBW commented 6 months ago

I mean put the repo like https://github.com/facebookincubator/substrait-velox

So we needn't maintain it in Velox. People who wants to maintain it can submit PR to the new repo. In future if It's really commonly used, then we can move back to Velox.

mbasmanova commented 6 months ago

@FelixYBW I don't think we can create a repo like that without maintaining it.

drin commented 6 months ago

@FelixYBW do you want to maintain the repo in a separate org? maybe we can ask to keep it in https://github.com/substrait-io/?

FelixYBW commented 6 months ago

@FelixYBW do you want to maintain the repo in a separate org? maybe we can ask to keep it in https://github.com/substrait-io/?

Yes, it's another choice. @EpsilonPrime

drin commented 6 months ago

I'm asking in the substrait slack as well (substrait.slack.com)

drin commented 6 months ago

The initial responses seem to be the following:

I'd like to see at least one committer and one non-committer that are invested in maintaining such a project

and

Presumably a new repo would require an SMC vote

So, I think for a short-term solution either me owning it or making an unofficial "incubation" organization is the most timely solution