Zendro-dev / graphql-server

Skeleton NodeJS project for a graphQL server.
GNU General Public License v3.0
0 stars 1 forks source link

searching for a string should ingore case and diacritic accents #73

Open AliciaMstt opened 3 years ago

AliciaMstt commented 3 years ago

Ignore case

Currently searching for a string is case sensitive, so that in the example below in the SPA searching for "chaya" yields no results:

image

Whilst "Chaya" does:

image

The desired behavior is that "chaya" "Chaya", "chAya" or any similar form should bring the same results than "Chaya".

Accents We have a similar issue with the special character ´ and others, which in Spanish are used all the time. Unfortunately even if we know that the correct spelling is maíz, users commonly would look for maiz. Therefore we need that searching for a string to ignore accents of anykind, something like:

a == âàáäÂÀÂÄ 
e == êèéëÊÈÉË 
i == îïíÎÏ 
o == ôöóÔÖ 
u == ûùüÛÜÙ 
c == çÇ 
n == ñ

So that if maíz is present in the db, the query maiz and maíz should yield the same results.

asishallab commented 3 years ago

A problem with a general solution is that the like operator is not implemented in all storage types. MongoDb and Neo4J only support regular expressions, which, of course, could be used to implement that behavior in these storage techs. Additionally the ilike operator which is case in-sensitive is only implemented in Postgres. Furthermore, we could not use regular expressions in all storages, because regexs are not implemented in Presto / Trino and Amazon S3 tables.

Thus, we'd need to first identify

Then we implement a helper module that constructs the respective storage type correct search and executes it in its respective data model layer. We could even (thanks @coeit ) implement a new operator like zilike (Zendro - ilike) and implement that for each storage separately.

The main question is to identify a common denominator of search function that can be implemented in all storage techs, but Cassandra. This question can be answered using something like the following hypothetical example table, inspired by Wikipedia's tech-tables: Storage-Technology Regular Expression Operator regexp supports case-insensitive like Operator ilike Operator
Sequelize regexp ? true only postgres
coeit commented 3 years ago
Storage-Technology Regular Expression Operator regexp supports case-insensitive like Operator ilike Operator resource
Sequelize regexp (~, PG/MySQL only) iRegexp (~*, PG only) like ilike (PG only) https://sequelize.org/master/manual/model-querying-basics.html#operators
MongoDB $regex $options: 'i' can be implemented with regex can be implemented with regex https://docs.mongodb.com/manual/reference/operator/query-evaluation/
neo4j =~ (?i)<regexp> can be implemented with regex can be implemented with regex https://neo4j.com/docs/cypher-manual/current/clauses/where/#query-where-string
presto / trino regexp_like(<regexp>, <string>) regexp_like((?i)<regexp>, <string>) LIKE LOWER(<string>) LIKE OR can be implemented with regex https://trino.io/docs/current/functions.html
amazonS3 - - LIKE LOWER(<string>) LIKE https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference-operators.html#s3-glacier-select-sql-reference-pattern
coeit commented 3 years ago

Since regular expressions have quite a different syntax depending on the storage type when it comes to setting flags (for Zendro we need case insensitivity), a consistent way to define those flags in Zendro has to be defined.

Solutions

iRegexp operator

In the spirit of Sequelize, define a iRegexp operator that implements the i flag depending on the storage type.

Define a Zendro specific regexp syntax

We can use e,g, Java regular expressions syntax like neo4j/ presto and trino do. We would have to translate this for mongoDB and sequelize to their specific implementations

asishallab commented 3 years ago

Response to comment about iRegexp

Implement the Zendro operator iRegexp and notiRegexp. It seems to be the best solution

asishallab commented 3 years ago

In order to resolve this issue we should first implement a basic solution and use the ilike operator in all storages, Zendro currently supports. This will avoid having errors if one Zendro instance uses Amozon S3. In a later iteration we could think about implementing a new zilike operator that actually uses regular expressions to match e.g. "a" with all variations of the letter with accents and such, in Amazon S3 these searches would be done with its internal ilike and the _ wild-card. zilike would thus have different behavior in different storage types.