DiceDB / dice

DiceDB is an in-memory real-time database with SQL-based reactivity. It is hyper-optimized for building and scaling truly real-time applications on modern hardware while being a drop-in replacement for Redis.
https://dicedb.io/
Other
3.98k stars 493 forks source link

Add JSON support to WHERE clause evaluation in Executor #189

Closed JyotinderSingh closed 2 weeks ago

JyotinderSingh commented 1 month ago

We currently support string, integer, and float comparisons in the DSQL executor (core/executor.go). We need to support JSON fields as part of WHERE clause expressions.

Requirements

Syntax support

Testing

Notes

This feature can be worked on once #165 is completed.

vaibh1297 commented 1 month ago

@JyotinderSingh can I work on this one as well ?

JyotinderSingh commented 1 month ago

@JyotinderSingh can I work on this one as well ?

Will assign once the other issue assigned to you is completed. In case someone wants to take this up in the meantime they are free to do so.

YahyaHaq commented 1 month ago

Hey @JyotinderSingh can I pick up this task

JyotinderSingh commented 1 month ago

Hey @JyotinderSingh can I pick up this task

Assigned

YahyaHaq commented 1 month ago

@arpitbbhayani @JyotinderSingh While working on this task I came across a couple of issues I wanted to discuss

  1. Problem: We have a bug in our QWATCH command. We can store a value as an integer but on retrieval it treats it as a string. Running this query in qwatch_test.go var qWatchQuery = "SELECT $key, $value FROM 'match:100:*' WHERE $value > 1 ORDER BY $value DESC LIMIT 3" will give us an error "2024/08/03 12:43:48 ERRO incompatible types in comparison: string and int". Solution: I can create a fix which will ensure all values stored will be retrieved with the same type at storage time. I can create this fix and add extensive tests within this PR or I can create a new PR for this fix. I believe its important to fix this before moving onto filtering based on JSON
  2. There is an issue with sqlparser. It isn't compatible with our style of querying JSON. When I pass this query SELECT $key, $value FROM 'match:100:*' WHERE $value.field1.field2.field3 > 1 ORDER BY $value DESC LIMIT 3 it gives me this error on parsing the SQL statement "error parsing SQL statement: syntax error at position 67". This error does disappear if I limit my query to two nested fields rather than 3 but then the issue arises that it ignores $value and considers field2 to be our column name. Solution: We can follow the convention present in PostgresSQL and query json using ->. this will transform our query into this format: SELECT $key, $value FROM 'match:100:*' WHERE $value->field1->field2->field3 > 1 ORDER BY $value DESC LIMIT 3. What are your opinions on this approach
JyotinderSingh commented 1 month ago

@arpitbbhayani @JyotinderSingh While working on this task I came across a couple of issues I wanted to discuss

  1. Problem: We have a bug in our QWATCH command. We can store a value as an integer but on retrieval it treats it as a string. Running this query in qwatch_test.go var qWatchQuery = "SELECT $key, $value FROM 'match:100:*' WHERE $value > 1 ORDER BY $value DESC LIMIT 3" will give us an error "2024/08/03 12:43:48 ERRO incompatible types in comparison: string and int".

Solution: I can create a fix which will ensure all values stored will be retrieved with the same type at storage time. I can create this fix and add extensive tests within this PR or I can create a new PR for this fix. I believe its important to fix this before moving onto filtering based on JSON

  1. There is an issue with sqlparser. It isn't compatible with our style of querying JSON. When I pass this query SELECT $key, $value FROM 'match:100:*' WHERE $value.field1.field2.field3 > 1 ORDER BY $value DESC LIMIT 3 it gives me this error on parsing the SQL statement "error parsing SQL statement: syntax error at position 67". This error does disappear if I limit my query to two nested fields rather than 3 but then the issue arises that it ignores $value and considers field2 to be our column name.

Solution: We can follow the convention present in PostgresSQL and query json using ->. this will transform our query into this format: SELECT $key, $value FROM 'match:100:*' WHERE $value->field1->field2->field3 > 1 ORDER BY $value DESC LIMIT 3. What are your opinions on this approach

Thanks for the extensive details @YahyaHaq! Please go ahead with including the fix for (1) in this PR. Secondly, the arrow convention is a good approach for nested field access. Please go ahead with it.

Thanks a lot for catching these issues and suggesting fixes!

YahyaHaq commented 1 month ago

@arpitbbhayani @JyotinderSingh A few updates regarding my last message:

Regarding (2) I just confirmed that passing in query in this format SELECT _key, _value FROM 'match:100:*' WHERE _value->field1 = '11' ORDER BY _value DESC LIMIT 3 does not work either for SQLParser. I receive the error "panic: syntax error at position 60 near 'field1'". I can create an issue on github and see if I can do anything to resolve it myself and in the meantime I can also check if there is any other SQLParser that will suit us better. If there is any other solution you can suggest that may provide a faster fix then that would be great but until this is resolved I will not be able to complete this ticket

JyotinderSingh commented 1 month ago

@arpitbbhayani @JyotinderSingh

A few updates regarding my last message:

Regarding (2) I just confirmed that passing in query in this format SELECT _key, _value FROM 'match:100:*' WHERE _value->field1 = '11' ORDER BY _value DESC LIMIT 3 does not work either for SQLParser. I receive the error "panic: syntax error at position 60 near 'field1'". I can create an issue on github and see if I can do anything to resolve it myself and in the meantime I can also check if there is any other SQLParser that will suit us better. If there is any other solution you can suggest that may provide a faster fix then that would be great but until this is resolved I will not be able to complete this ticket

You should replace _value with $value and _key with $key. But this issue does not seem related to that. Let me think over it. In the mean time you can redirect this PR to only fix (1) and we can create a new PR for (2) after appropriate discussion.

YahyaHaq commented 1 month ago

in our code we replace $value with _value before passing it into the parser

YahyaHaq commented 1 month ago

Regarding (1)

JyotinderSingh commented 1 month ago

@YahyaHaq Let's do one thing, to support the JSONPath syntax let's require the user to pass the JSONPath surrounded by backticks, as follows:

SELECT $key, $value FROM 'match:100:*' WHERE `$value.field1.field2.field3` > 1 ORDER BY $value DESC LIMIT 3

The parser should be able to handle this. How does that sound?

JyotinderSingh commented 1 month ago

Regarding (1)

  • I think I should also be assigned the issue Add integration test for testing WHERE clause in QWATCH. #188 as if i work on this issue my changes will be overwriting the integration tests in this file. If the person assigned to this hasn't created the PR then I can work on this otherwise I can work on top of their PR. What do you think

Let's let it be for now, we can merge whichever PR gets ready first and then work from there.

YahyaHaq commented 1 month ago

@JyotinderSingh Thank you so much for that SQL solution. This works great!

JyotinderSingh commented 1 month ago

@JyotinderSingh Thank you so much for that SQL solution. This works great!

Feel free to raise a draft PR, I can provide some ongoing feedback.

JyotinderSingh commented 1 month ago

Hi @YahyaHaq Any updates on this issue?

YahyaHaq commented 1 month ago

Hey @JyotinderSingh apologies for any delay on my end. Will try to send a PR by Saturday. There are no current blockers for this task

JyotinderSingh commented 3 weeks ago

Hey @JyotinderSingh apologies for any delay on my end. Will try to send a PR by Saturday. There are no current blockers for this task

Hi @YahyaHaq, wondering if you'll be able to close this anytime soon?

YahyaHaq commented 3 weeks ago

yes. all thats left is the benchmark and the unit tests. Functionality and integration tests are done. You can take a look at my draft PR

JyotinderSingh commented 3 weeks ago

yes. all thats left is the benchmark and the unit tests. Functionality and integration tests are done. You can take a look at my draft PR

The changes look good overall, you can clean up the code and add the remaining tests.

AshwinKul28 commented 2 weeks ago

Thanks, @YahyaHaq for contributing. Closing this issue.

Status - shipped 🚀