facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.3k stars 1.09k forks source link

Add Presto functions #2262

Open mbasmanova opened 1 year ago

mbasmanova commented 1 year ago

Velox includes many of the PrestoSQL functions, but a few are still missing. It would great to add these.

Function coverage map: https://facebookincubator.github.io/velox/functions/coverage.html

A subset of missing functions that would be most helpful to add.

Array functions

all_match (lambda function) https://github.com/facebookincubator/velox/pull/3356 any_match (lambda function) array_average #2434 array_frequency #3807 array_has_duplicates #3320 array_normalize array_remove array_union flatten repeat sequence shuffle #3404 zip_with (lambda function) #2685

Map functions

map_from_entries #3417 map_normalize #9086 map_zip_with (lambda function) #2711

JSON functions

is_json_scalar https://github.com/facebookincubator/velox/pull/2291 json_array_contains https://github.com/facebookincubator/velox/pull/2299 json_array_get json_array_length https://github.com/facebookincubator/velox/pull/2294 json_extract #5269 json_format #3525 json_parse #3663 json_size #3413

String functions

split_to_map split_to_multimap strrpos

Regular expression functions

regexp_split

Date and Time functions

timezone_hour timezone_minute week https://github.com/facebookincubator/velox/pull/2287 week_of_year https://github.com/facebookincubator/velox/pull/2287

Mathematical functions

truncate

mbasmanova commented 1 year ago

CC: @majetideepak @aditi-pandit

pramodsatya commented 1 year ago

Thanks for sharing the list of missing functions. Following functions have been added: Date and Time functions week, week_of_year: https://github.com/facebookincubator/velox/pull/2287

JSON functions is_json_scalar: https://github.com/facebookincubator/velox/pull/2291 json_array_length: https://github.com/facebookincubator/velox/pull/2294 json_array_contains: https://github.com/facebookincubator/velox/pull/2299

Working on the following functions: Mathematical functions truncate (shelved till decimal to double cast is supported)

JSON functions json_array_get (Not implementing because the usage of this function is not recommended)

jwyles-ahana commented 1 year ago

I am going to start working on array_union.

aditi-pandit commented 1 year ago

I am going to start working on array_union.

There was a prior PR for array_union https://github.com/facebookincubator/velox/pull/867 that was abandoned. Maybe you can check with @kagamiori about it.

kagamiori commented 1 year ago

I am going to start working on array_union.

There was a prior PR for array_union #867 that was abandoned. Maybe you can check with @kagamiori about it.

@aditi-pandit Thank you for bringing up this! It was a PR I didn't finish. I originally planned to rewrite it as a simple function once it's supported. (I just confirmed that it's not supported yet currently.) Do you need this function soon? Please feel free to take it over, or let me know if you want me to finish #867.

aditi-pandit commented 1 year ago

There isn't an urgency for the array_union function rightaway. We were just picking stuff from Masha's list above.

jwyles-ahana commented 1 year ago

I will leave array_union to @kagamiori and instead start on array_average instead.

gosharz commented 1 year ago

Thanks for sharing the list of missing functions. Following functions have been added: Date and Time functions week, week_of_year: #2287

JSON functions is_json_scalar: #2291 json_array_length: #2294 json_array_contains: #2299

Working on the following functions: Mathematical functions truncate (shelved till decimal to double cast is supported)

JSON functions json_array_get (Not implementing because the usage of this function is not recommended)

Hi @pramodsatya!

Looking for functions to pick up. Wondering if you are still working on truncate?

Cheers, Gosh

pramodsatya commented 1 year ago

Hi @gosharz, I am not working on truncate function. Thanks for checking.

gosharz commented 1 year ago

@pramodsatya mind if I pick it up?

pramodsatya commented 1 year ago

No, please go for it. Thank you!

@pramodsatya mind if I pick it up?

gosharz commented 1 year ago

Adding truncate: https://github.com/facebookincubator/velox/pull/2862

gosharz commented 1 year ago

Will also give a try to strrpos if nobody minds :)

gosharz commented 1 year ago

Here we go: https://github.com/facebookincubator/velox/pull/2903

darrenfu commented 1 year ago

Hi @mbasmanova,

I'd like to claim this array function first: array_has_duplicates: #3397

darrenfu commented 1 year ago

Hi @mbasmanova,

I'd like to claim this array function first: array_has_duplicates: #3397

Looks like there is a duplicate WIP PR on the same udf, array_has_duplicates: #3320

I switched to shuffle: #3404 (ready for review)

czentgr commented 1 year ago

Hello @mbasmanova,

I'm claiming functions: timezone_hour timezone_minute current_date current_time

duanmeng commented 1 year ago

Hi @mbasmanova I'm claiming any_match (lambda function). I am working on all_match (lambda function) https://github.com/facebookincubator/velox/pull/3356, and will continue to work on any_match #4327 once #3356 is merged.

mbasmanova commented 1 year ago

@duanmeng

I'm claiming any_match (lambda function).

Sounds great.

svm1 commented 1 year ago

Hi @mbasmanova, I would like to claim the json_parse function.

mbasmanova commented 1 year ago

@svm1 Looks like json_parse was added in #3663

svm1 commented 1 year ago

Thanks @mbasmanova, must've missed that. Then may I claim theflatten array function? Doesn't look like it's been added yet.

svm1 commented 1 year ago

Hi @mbasmanova, I would actually like to take the split_to_map string function first instead if that's alright.

mbasmanova commented 1 year ago

Hi @mbasmanova, I would actually like to take the split_to_map string function first instead if that's alright.

That's fine. Thanks.

svm1 commented 1 year ago

Hi @mbasmanova, I would also like to claim the following functions:

from_iso8601_date
from_iso8601_timestamp
current_timezone
SANTHOSH-MAMIDISETTI commented 1 year ago

hello all , seems like a lot has been done !. Is there anything that I could be able to work on ? , I am a newbie to opensource , but I believe I have good skills in C++ , C , Python and such , I hope @mbasmanova or someone would be able to help me soon , cheers!

dusx1981 commented 12 months ago

I want to join this interesting work, My code analysis about Velox compilation execution: Velox--compile, @mbasmanova I'm claiming array_remove

mbasmanova commented 12 months ago

@dusx1981 Welcome. FYI, someone might be already working on array_remove: https://github.com/facebookincubator/velox/pull/5538

mbasmanova commented 12 months ago

@SANTHOSH-MAMIDISETTI @dusx1981 Welcome, folks. Would you provide some context re: your interest in Velox. Are you part of the teams that use Velox? If so, what are these team do?

dusx1981 commented 11 months ago

@SANTHOSH-MAMIDISETTI @dusx1981 Welcome, folks. Would you provide some context re: your interest in Velox. Are you part of the teams that use Velox? If so, what are these team do?

We are working on a distributed database, and we need to use things like ICompiledCall.

dusx1981 commented 11 months ago

Which array related functions can I claim?

mbasmanova commented 11 months ago

@dusx1981 Curious, which database do you work on? BTW, development of codegen in Velox has been paused long time ago.

Consider, adding a family of map_top_n_xxx functions: https://prestodb.io/docs/current/functions/map.html#map_top_n_keys

dusx1981 commented 11 months ago

@dusx1981 Curious, which database do you work on? BTW, development of codegen in Velox has been paused long time ago.

Consider, adding a family of map_top_n_xxx functions: https://prestodb.io/docs/current/functions/map.html#map_top_n_keys

We are a project being worked on by a studio, and we are currently in the development stage, and we are also soliciting the name of the database.

Do you mean adding functions to this framework?presto

mbasmanova commented 11 months ago

We are a project being worked on by a studio,

Curious which studio is this?

Do you mean adding functions to this framework?presto

This link is in Chinese, which I unfortunately cannot read. You asked "Which array related functions can I claim?" and I suggested to pick up map_top_n_xxx Presto functions.

mbasmanova commented 11 months ago

We are choosing a solution now, and we don’t know what implementation solution to choose for a language like GO. Can you give some guidance.

Sure. I would need to understand a bit more about the system you are building to advise. What kind of problems are you looking to solve and what is your tentative solution?

dusx1981 commented 11 months ago

We are choosing a solution now, and we don’t know what implementation solution to choose for a language like GO. Can you give some guidance.

Sure. I would need to understand a bit more about the system you are building to advise. What kind of problems are you looking to solve and what is your tentative solution?

I just have a question, we want to use Go to implement a set of JVM compilation and execution logic like Presto. But you know that Java can directly generate bytecode at runtime and load it through ClassLoader, but like Go or C++, it can only be compiled and executed statically. So we want to refer to the implementation of Velox and implement a Go version.

Volex stopped the development of codegen, is it because of performance reasons, is my idea above feasible?

mbasmanova commented 11 months ago

@dusx1981 What is the problem you are trying to solve? Are you looking to build a more efficient / faster query engine? Where do you think the speedup / efficiency will come from?

We stopped codegen development primarily because codegen is harder to debug and develop. We do not believe it will be faster for analytical workloads.

dusx1981 commented 11 months ago

@dusx1981 What is the problem you are trying to solve? Are you looking to build a more efficient / faster query engine? Where do you think the speedup / efficiency will come from?

We stopped codegen development primarily because codegen is harder to debug and develop. We do not believe it will be faster for analytical workloads.

Another question, why not perform compiling and linking operations in memory instead of writing to files, which adds IO operations? Will this have a significant impact on performance?

mbasmanova commented 11 months ago

Another question, why not perform compiling and linking operations in memory instead of writing to files, which adds IO operations? Will this have a significant impact on performance?

@dusx1981 I suggest to open a separate GitHub issue and continue discussion there. CC: @laithsakka

dusx1981 commented 11 months ago

Another question, why not perform compiling and linking operations in memory instead of writing to files, which adds IO operations? Will this have a significant impact on performance?

@dusx1981 I suggest to open a separate GitHub issue and continue discussion there. CC: @laithsakka

#5840

Real-Chen-Happy commented 3 months ago

Hi all, I am new to OLAP databases and I am extremely interested in Velox. Is there anything that I could work on? I am thinking of adding array_frequency if nobody is currently working on it. Thanks!

mbasmanova commented 3 months ago

@Real-Chen-Happy Welcome! I suggest to start with #3728 or one of fuzzer-found issues: https://github.com/facebookincubator/velox/issues?q=is%3Aopen+is%3Aissue+label%3Afuzzer-found

mbasmanova commented 3 months ago

@Real-Chen-Happy array_frequency function has been added in #3807

Real-Chen-Happy commented 3 months ago

@Real-Chen-Happy Welcome! I suggest to start with #3728 or one of fuzzer-found issues: https://github.com/facebookincubator/velox/issues?q=is%3Aopen+is%3Aissue+label%3Afuzzer-found

Thank you for your reply! I will start to take a look at #3728

mbasmanova commented 3 months ago

I will start to take a look at https://github.com/facebookincubator/velox/issues/3728

@Real-Chen-Happy Thanks. BTW, would you introduce yourself and share a bit about where / how you use Velox? Do you know anyone in Velox community who can help you onboard to the codebase and provide guidance on your first PRs? If not, please, create a GitHub issue to ask if anyone is willing to help with that.

Real-Chen-Happy commented 3 months ago

Yeah sure! I am Real and I have some experiences in OLTP systems. My current work is not related to Velox directly. Contributing to Velox is my personal interest because I believe the future of DBMS will be composable, and Velox will definitely play a key role. I would love to contribute my efforts in this area. I am new in this community, so let me know if anybody in this community is interested in providing some mentorship! #9262

Sutter099 commented 3 months ago

Where should I find the functions that need to be supplemented? What I see in this link seems a bit outdated

mbasmanova commented 3 months ago

@Sutter099 A coverage map might be a good place to find functions still missing in Velox:

https://facebookincubator.github.io/velox/functions/presto/coverage.html