Open mbasmanova opened 7 months ago
cc: @rui-mo @FelixYBW
I'm OK to remove them since they are moved in Gluten now. See if Voltron Data guys still wants to work on it.
I don't think anyone else can use the version that's inside Gluten. Ideally the Substrait->Velox code would live in its own repository so it can be used by anyone needing a Substrait gateway to Velox (similar to datafusion's substrait offering).
I don't think anyone else can use the version that's inside Gluten. Ideally the Substrait->Velox code would live in its own repository so it can be used by anyone needing a Substrait gateway to Velox (similar to datafusion's substrait offering).
Yes, but the problem now is it's not mature yet and no one is developing or maintain the code. I don't know if anyone still uses it. In future if someone owns the work, they can start from current code.
Gluten team currently has no resource to work on it, it's not our high priority work. If Voltron Data is going to make it mature, Gluten will switch from our internal one to Velox's.
Is velox open to someone else maintaining the code in a separate repository as @EpsilonPrime mentioned?
I am researching the use of substrait to span engines such as Acero, DuckDB, etc. I would be willing to be a maintainer for a separate repository for the forseeable future; or, I can coordinate with VoltronData or @EpsilonPrime on some sort of shared maintainership.
with a very brief look at the code, it seems like the code doesn't need to be compiled in-tree, but let me know if that's incorrect.
Is velox open to someone else maintaining the code in a separate repository as @EpsilonPrime mentioned?
Definitely. You may contribute if you need. The code in velox is uncompleted yet. You may refer to the one Gluten currently is using: https://github.com/oap-project/gluten/tree/main/cpp/velox/substrait
The substrait we used is a pretty old version, you may upgrade to latest version. Also substrait itself is not completed yet. There should be some PR to add functions.
@drin
Is velox open to someone else maintaining the code in a separate repository as @EpsilonPrime mentioned?
Sure. That works. Thanks.
You may refer to the one Gluten currently is using
Thanks for the pointer. Was the substrait code donated to the Gluten codebase? the licenses attribute different organizations.
If Voltron Data is going to make it mature, Gluten will switch from our internal one to Velox's.
Are Gluten developers maintaining the substrait -> velox code? What would be the process for Gluten to switch from an internal implementation to an external implementation? I'm wondering if I can work with Gluten developers to move the code out sooner rather than later.
Probably for both of the above I should just ask Gluten devs anyways, but figured I would ask here in case you know.
Thanks for the pointer. Was the substrait code donated to the Gluten codebase? the licenses attribute different organizations.
No, the substrait support in Velox is added by Gluten team initially. Gluten only holds the codes which are not upstream to velox yet. But later the hold code is more and more so we stopped the upstream work. Your contribution should be to Velox directly. Not gluten.
Are Gluten developers maintaining the substrait -> velox code? What would be the process for Gluten to switch from an internal implementation to an external implementation? I'm wondering if I can work with Gluten developers to move the code out sooner rather than later.
Probably for both of the above I should just ask Gluten devs anyways, but figured I would ask here in case you know.
Gluten developers only maintain the substrait -> Velox code in Gluten repo, not the one in Velox now. You may port the code in Gluten and upstream to Velox. Or you can start from fresh. Either way works. If the conversion in Velox is mature enough, we will remove the code in gluten and switch to Velox one.
Your contribution should be to Velox directly.
If the conversion in Velox is mature enough
To clarify: it is preferred that I just help maintain the substrait conversion in the velox repo itself?
I was considering creating a velox-substrait repo and porting most of the CMakeLists.txt from gluten to place in the new repo. Then, the code can be compiled out of the velox source tree and any systems that use velox can leverage that as a submodule or something similar. However, it sounds like this is not the preferred approach?
Would a preferred approach be to somehow facilitate changes upstreamed from gluten into velox?
The reason I ask is that it seems that if gluten is willing to use substrait conversion in the velox repo, but they're maintaining their own version already, what is the barrier to them just migrating the code themselves and working in the velox repo?
I also want to clarify that I am currently a grad student that is not funded by anything gluten or velox related, so if some of the need is for someone to review changes to upstream into velox then I probably would not be able to keep up with that.
@drin
I was considering creating a velox-substrait repo
A separate repo would be preferred. Thanks.
thanks for the fast response!
I'll try to have https://github.com/drin/velox-substrait/issues/1 finished soon so that you can resolve this issue.
@drin Thanks.
The reason I ask is that it seems that if gluten is willing to use substrait conversion in the velox repo, but they're maintaining their own version already, what is the barrier to them just migrating the code themselves and working in the velox repo?
Because upstreaming to velox still needs much clean up work and take much time. It's not Gluten's high priority work.
If you are going to setup a new repo, let's remove the code in velox now.
@mbasmanova Any possible to put the substrait-Velox repo as a side project of Velox?
@FelixYBW Binwei, what do you mean by "side project"?
I mean put the repo like https://github.com/facebookincubator/substrait-velox
So we needn't maintain it in Velox. People who wants to maintain it can submit PR to the new repo. In future if It's really commonly used, then we can move back to Velox.
@FelixYBW I don't think we can create a repo like that without maintaining it.
@FelixYBW do you want to maintain the repo in a separate org? maybe we can ask to keep it in https://github.com/substrait-io/?
@FelixYBW do you want to maintain the repo in a separate org? maybe we can ask to keep it in https://github.com/substrait-io/?
Yes, it's another choice. @EpsilonPrime
I'm asking in the substrait slack as well (substrait.slack.com)
The initial responses seem to be the following:
I'd like to see at least one committer and one non-committer that are invested in maintaining such a project
and
Presumably a new repo would require an SMC vote
So, I think for a short-term solution either me owning it or making an unofficial "incubation" organization is the most timely solution
Bug description
This code is no longer used. Let's delete it to avoid confusing and cycles spent on maintanance.
CC: @kgpai
System information
n/a
Relevant logs
No response