Open AliciaMstt opened 3 years ago
A problem with a general solution is that the like
operator is not implemented in all storage types. MongoDb and Neo4J only support regular expressions, which, of course, could be used to implement that behavior in these storage techs. Additionally the ilike
operator which is case in-sensitive is only implemented in Postgres. Furthermore, we could not use regular expressions in all storages, because regexs are not implemented in Presto / Trino and Amazon S3 tables.
Thus, we'd need to first identify
Then we implement a helper module that constructs the respective storage type
correct search and executes it in its respective data model layer. We could even
(thanks @coeit ) implement a new operator like zilike
(Zendro - ilike) and implement
that for each storage separately.
The main question is to identify a common denominator of search function that can be implemented in all storage techs, but Cassandra. This question can be answered using something like the following hypothetical example table, inspired by Wikipedia's tech-tables: | Storage-Technology | Regular Expression Operator | regexp supports case-insensitive | like Operator | ilike Operator |
---|---|---|---|---|---|
Sequelize | regexp |
? | true | only postgres |
Storage-Technology | Regular Expression Operator | regexp supports case-insensitive | like Operator | ilike Operator | resource |
---|---|---|---|---|---|
Sequelize | regexp (~, PG/MySQL only) |
iRegexp (~*, PG only) |
like |
ilike (PG only) |
https://sequelize.org/master/manual/model-querying-basics.html#operators |
MongoDB | $regex |
$options: 'i' |
can be implemented with regex | can be implemented with regex | https://docs.mongodb.com/manual/reference/operator/query-evaluation/ |
neo4j | =~ |
(?i)<regexp> |
can be implemented with regex | can be implemented with regex | https://neo4j.com/docs/cypher-manual/current/clauses/where/#query-where-string |
presto / trino | regexp_like(<regexp>, <string>) |
regexp_like((?i)<regexp>, <string>) |
LIKE |
LOWER(<string>) LIKE OR can be implemented with regex |
https://trino.io/docs/current/functions.html |
amazonS3 | - | - | LIKE |
LOWER(<string>) LIKE |
https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference-operators.html#s3-glacier-select-sql-reference-pattern |
Since regular expressions have quite a different syntax depending on the storage type when it comes to setting flags (for Zendro we need case insensitivity), a consistent way to define those flags in Zendro has to be defined.
iRegexp
$options: 'i'
(?i)
iRegexp
operatorIn the spirit of Sequelize, define a iRegexp
operator that implements the i
flag depending on the storage type.
notiRegexp
?We can use e,g, Java regular expressions syntax like neo4j/ presto and trino do. We would have to translate this for mongoDB and sequelize to their specific implementations
iRegexp
Implement the Zendro operator iRegexp
and notiRegexp
. It seems to be the best solution
In order to resolve this issue we should first implement a basic solution and use the ilike
operator in all storages, Zendro currently supports. This will avoid having errors if one Zendro instance uses Amozon S3. In a later iteration we could think about implementing a new zilike
operator that actually uses regular expressions to match e.g. "a" with all variations of the letter with accents and such, in Amazon S3 these searches would be done with its internal ilike
and the _
wild-card. zilike
would thus have different behavior in different storage types.
Ignore case
Currently searching for a string is case sensitive, so that in the example below in the SPA searching for "chaya" yields no results:
Whilst "Chaya" does:
The desired behavior is that "chaya" "Chaya", "chAya" or any similar form should bring the same results than "Chaya".
Accents We have a similar issue with the special character ´ and others, which in Spanish are used all the time. Unfortunately even if we know that the correct spelling is maíz, users commonly would look for maiz. Therefore we need that searching for a string to ignore accents of anykind, something like:
So that if
maíz
is present in the db, the querymaiz
andmaíz
should yield the same results.