ArcadeData / arcadedb

ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.
https://arcadedb.com
Apache License 2.0
484 stars 61 forks source link

Error with containstext against an indexed property #1062

Open marco-brandizi opened 1 year ago

marco-brandizi commented 1 year ago

ArcadeDB Version: 23.3.1

JDK Version: OpenJDK 64-Bit Server VM Temurin-11.0.18+10 (build 11.0.18+10, mixed mode)

OS: macos 13.3 (22E252)

Expected behavior

(I've done all of the operations described in the follow from the studio web app)

I'm new to ArcadeDB and so far I've managed to import a graph database from a .graphml file. Then, while playing with indexes, I tried this:

# These two went fine
create property `Person:Resource`.`label` string
create index on `Person:Resource` ( `label` ) full_text

# This failed
select * from `Person:Resource` where `label` containstext 'John'

the select fails with the, while I was expecting some result.

Actual behavior

It fails with the following:

<ArcadeDB_0> Error on command execution (PostCommandHandler)
java.lang.UnsupportedOperationException: Cannot execute index query with `label` CONTAINSTEXT 'John'
    at com.arcadedb.query.sql.parser.BooleanExpression.resolveKeyFrom(BooleanExpression.java:203)
    at com.arcadedb.query.sql.executor.FetchFromIndexStep.indexKeyFrom(FetchFromIndexStep.java:537)
    at com.arcadedb.query.sql.executor.FetchFromIndexStep.processAndBlock(FetchFromIndexStep.java:280)
    at com.arcadedb.query.sql.executor.FetchFromIndexStep.init(FetchFromIndexStep.java:208)
    at com.arcadedb.query.sql.executor.FetchFromIndexStep.init(FetchFromIndexStep.java:192)
    at com.arcadedb.query.sql.executor.FetchFromIndexStep.syncPull(FetchFromIndexStep.java:85)
    at com.arcadedb.query.sql.executor.GetValueFromIndexEntryStep$1.fetchNextItem(GetValueFromIndexEntryStep.java:101)
    at com.arcadedb.query.sql.executor.GetValueFromIndexEntryStep$1.hasNext(GetValueFromIndexEntryStep.java:71)
    at com.arcadedb.query.sql.executor.DistinctExecutionStep.fetchNext(DistinctExecutionStep.java:93)
    at com.arcadedb.query.sql.executor.DistinctExecutionStep$1.hasNext(DistinctExecutionStep.java:61)
    at com.arcadedb.query.sql.executor.FilterByClassStep$1.fetchNextItem(FilterByClassStep.java:59)
    at com.arcadedb.query.sql.executor.FilterByClassStep$1.hasNext(FilterByClassStep.java:97)
    at com.arcadedb.query.sql.executor.ProjectionCalculationStep$1.hasNext(ProjectionCalculationStep.java:43)
    at com.arcadedb.query.sql.parser.LocalResultSet.fetchNext(LocalResultSet.java:46)
    at com.arcadedb.query.sql.parser.LocalResultSet.<init>(LocalResultSet.java:39)
    at com.arcadedb.query.sql.parser.SelectStatement.execute(SelectStatement.java:207)
    at com.arcadedb.query.sql.parser.Statement.execute(Statement.java:85)
    at com.arcadedb.query.sql.parser.Statement.execute(Statement.java:69)
    at com.arcadedb.query.sql.SQLQueryEngine.command(SQLQueryEngine.java:97)
    at com.arcadedb.database.EmbeddedDatabase.command(EmbeddedDatabase.java:1264)
    at com.arcadedb.server.ServerDatabase.command(ServerDatabase.java:421)
    at com.arcadedb.server.http.handler.PostCommandHandler.executeCommand(PostCommandHandler.java:127)
    at com.arcadedb.server.http.handler.PostCommandHandler.execute(PostCommandHandler.java:88)
    at com.arcadedb.server.http.handler.DatabaseAbstractHandler.execute(DatabaseAbstractHandler.java:92)
    at com.arcadedb.server.http.handler.AbstractHandler.handleRequest(AbstractHandler.java:127)
    at io.undertow.server.Connectors.executeRootHandler(Connectors.java:393)
    at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:859)
    at org.jboss.threads.ContextHandler$1.runWith(ContextHandler.java:18)
    at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2513)
    at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1538)
    at org.xnio.XnioWorker$WorkerThreadFactory$1$1.run(XnioWorker.java:1282)
    at java.base/java.lang.Thread.run(Thread.java:829)

Am I doing something wrong? Essentially, I'm trying to understand how to efficiently search with partial text matches. Actually, I need to do it from Gremlin, but for the moment, I'd like to make it work at least using SQL.

Steps to reproduce

See above

lvca commented 1 year ago

Hi @marco-brandizi, welcome to ArcadeDB!

Have you tried this?

select * from `Person:Resource` where `label` like '%John%'
marco-brandizi commented 1 year ago

Ciao @lvca, and thank you.

Sorry, I forgot to add that bit: yes, I tried LIKE 'John%' (I've used SQL for too many years, to forget it! :-) ), and it works, however, when I click on the Explain tab in the Studio, it doesn't seem to be using the index, either a string or a full text one:

+ FETCH FROM TYPE Person:Resource
  + FETCH FROM BUCKET 25 (Person%3AResource_0) ASC
  + FETCH FROM BUCKET 28 (Person%3AResource_1) ASC
  + FETCH FROM BUCKET 31 (Person%3AResource_2) ASC
  + FETCH FROM BUCKET 34 (Person%3AResource_3) ASC
  + FETCH FROM BUCKET 37 (Person%3AResource_4) ASC
  + FETCH FROM BUCKET 40 (Person%3AResource_5) ASC
  + FETCH FROM BUCKET 43 (Person%3AResource_6) ASC
  + FETCH FROM BUCKET 46 (Person%3AResource_7) ASC
+ FILTER ITEMS WHERE 
  `label` LIKE 'John%'
+ LIMIT ( LIMIT 25)
+ CALCULATE PROJECTIONS
  *

I also wonder how much and in which cases I need an index for these searches. I plan to do many equality-based searches, but not so frequent partial matches. In the latter case, it seems that it does some optimisation anyway, by looking at a limited number of buckets (like in a hash table?), rather than doing a full scan.

lvca commented 1 year ago

@gramian I remember you played recently with those operators and full-text indexes. Any findings?

gramian commented 1 year ago

Indeed, I had (have) the same issue with a (multi property) Full-text index. I am also using the wildcard search as workaround. I will append my issue to this one later.

gramian commented 1 year ago

My problem was:

I was starting off this example: https://orientdb.com/docs/3.2.x/indexing/Full-Text-Index.html#two-minutes-tutorial but when using: SELECT FROM Item[text] WHERE text CONTAINSTEXT "sister" I get the error Cannot execute index query with text CONTAINSTEXT "sister".

But I don't know if it is related...

lvca commented 1 year ago

I've double-checked the OrientDB codebase and there is no one test with CONTAINSTEXT... I guess this was in the old SQL parser and never ported. So it's missing since 2015 I guess...

The full text index is never used from LIKE, ILIKE and CONTAINSTEXT. I have to write it from scratch. Starting from LIKE, I could split the text with the character % and search for all the words (If no % is present, it's an exact match and the full-text index can't be used because it splits the text in words).

Example:

SELECT FROM Book WHERE abstract LIKE "%magic%"
  1. If a FULL-TEXT index is defined on Book.abstract field, then a lookup of "magic" is executed
  2. If the index is not defined, a full scan is executed looking for "magic" in the abstract text

With CONTAINSTEXT it's easier, just look for a certain keyword. We could take this chance to extend the CONTAINSTEXT operator to allow an array of strings (the text must contains all the keywords - in AND). Example:

SELECT FROM Book WHERE abstract CONTAINSTEXT ["magic","wand"]
lvca commented 1 year ago

@marco-brandizi a temporary workaround is to use the EQUALS operator. Example:

select * from `Person:Resource` where `label` = 'John'

Should work. The equals uses the index and if it's FULL text, a full text search is executed. While I think this is not semantically intuitive/correct, waiting for the CONTAINSTEXT fix it would work.

marco-brandizi commented 1 year ago

Thanks. So, you mean '=' finds "John Doe" too when there is a full text index?

lvca commented 1 year ago

@marco-brandizi correct.

alan-strickland-red commented 1 year ago

I had a problem related to this where I was seeing the = operator return results when the value didn't exactly match which is explained by the above as I had a full text index on the property in question.

I found by swapping out = for <=> I could get my desired result because that seems to ignore the full text index.