TryGhost / NQL

MIT License
4 stars 8 forks source link

Contains, startsWith and endsWith (LIKE queries) #22

Open ErisDS opened 2 years ago

ErisDS commented 2 years ago

There's been a long-time desire to support LIKE queries in Ghost.

From a user perspective, this would allow us to do "contains", "startsWith" and "endsWith" type filters.

Reference implementations

NQL Syntax

All 3 can be prefixed with - for not

Examples

Mongo Syntax

Under the hood, NQL gets converted into Mongo's JSON query representation.

To check “contains” in mongo you need to do one of the below ref:

{ email: /fred@/i }

{ email: { $regex: /fred@/i } }

Not contains is one of:

{ email: { $not: /fred@/i } }

{ email: { $not: { $regex: /fred@/i } } }

The i is used to make the query case insensitive.

To make this easy to parse and process, we'll use the following two patterns:

{ email: { $regex: /fred@/i } } { email: { $not: /fred@/i } }

Starts with and ends with will become modifications on the regex:

{ email: { $regex: /^fred@/i } } { email: { $regex: /@gmail.com$/i } }

Have confirmed this works in mingo (the tool we use to query JSON directly).

A gotcha here is it will treat characters as regex characters, so we probably have to escape regex chars in the transform from NQL → mongo and then unescape them in the transform from mongo → SQL 🤔

SQL Syntax

SQL has two operators we could use here - LIKE or REGEXP:

Neither of these is case insensitive by default. LIKE only does wildcard matches, REGEXP allows for a wider range of queries & is therefore slower. We don’t need that power, complexity or risk right now, nor has there ever been a request for it, so we will stick to LIKE.

In order to use LIKE and case insensitive matching the SQL queries we need are:

select * from members.email where LOWER(email) LIKE LOWER('%fred@%'); // contains
select * from members.email where LOWER(email) LIKE LOWER('fred@%'); // starts with
select * from members.email where LOWER(email) LIKE LOWER('%@gmail.com'); //ends with
ErisDS commented 2 years ago

There is a basic implementation in place now, but this only works with literals.

Given the limitations on single character literals and that numbers are parsed as numbers, not literals, I think I probably need to at change this to at least support strings if not implement a concept of "regex" which is anything which comes after these operators.

ErisDS commented 2 years ago

Having slept on this and thought about it a bit, the main use case is type-ahead style lookups for matching values.

In this case, the user will be typing a single character first, followed by multiple characters that they expect to be treated literally. We also expect that users might type spaces, e.g. in a name they're looking for. Email addresses also commonly include the + char.

To resolve all of these use-cases in one fell swoop, the correct value type in NQL is STRING, not LITERAL.

Literals are for lazy matching, and work best for slugs, not user input strings.

Therefore rather than looking at adding support for multiple value types with regex operators, instead we'll be changing to only support strings.

ErisDS commented 2 years ago

It turns out that my original spec around LIKE not being case insensitive is wrong. That's what MySQL and SQLite do by default.

I also missed that knex has whereLike and whereILike functions that we could use instead of whereRaw. It feels nicer, so I'll have a stab at fixing this later.

ramrami commented 1 day ago

Not sure if this is the right place, but the current implementation fails when querying nested properties.

For example, filtering posts that have a tag with a slug that starts with a certain string: {{#get "posts" filter="tags.slug:~^'hash-test'" include="tags" }}

The DB logs for the query:

{
  method: 'select',
  options: {},
  timeout: false,
  cancelOnTimeout: false,
  bindings: [ 
     'published', 
     'post', 
     /^hash-test/i, 
     15 
  ],
  __knexQueryUid: '-ukkP2B0O_WFqzb_B8AoP',
  sql: 'select id,uuid,title,slug,html,comment_id,plaintext,feature_image,featured,type,status,locale,visibility,email_recipient_filter,created_at,created_by,updated_at,updated_by,published_at,published_by,custom_excerpt,codeinjection_head,codeinjection_foot,custom_template,canonical_url,newsletter_id,show_title_and_feature_image from `posts` where (`posts`.`status` = ? and (`posts`.`type` = ? and `posts`.`id` in (select `posts_tags`.`post_id` from `posts_tags` inner join `tags` on `tags`.`id` = `posts_tags`.`tag_id` where `tags`.`slug` like ?))) order by `posts`.`published_at` DESC limit ?'
}

As you can see in the bindings, the regexp wasn't transformed to a LIKE expression. Tested on Ghost 5.98 with local sqlite db.