Performance Degradation on Simple Query with ArcadeDB Embedded Version 24.5.1

ArcadeData / arcadedb

ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.

https://arcadedb.com

Apache License 2.0

502 stars 61 forks source link

Performance Degradation on Simple Query with ArcadeDB Embedded Version 24.5.1 #1730

Open ismaeladra opened 2 months ago

ismaeladra commented 2 months ago

ArcadeDB Version: 24.5.1 (Embedded) Operating System: Windows JDK Version: 11 Nodes: 8357 Edges: 13557

Problem Description: I am experiencing significant performance issues when running a simple query on ArcadeDB embedded version 24.5.1. The query in question is as follows: MATCH (n:V_6)-[e:E_36]-(p:V_8) WHERE n.id = 647 RETURN n, p LIMIT 1 This query takes approximately 4 seconds to retrieve a result, which seems excessive given the dataset size (8357 nodes and 13557 edges). I would expect much faster execution times for a query of this simplicity. I have searched through the documentation but could not find any clear information regarding performance tuning or optimization guidelines for embedded use cases. This issue occurs consistently with the same query structure and dataset size.

Any suggestions for improving query performance, configuration changes, or indexing strategies would be greatly appreciated.

lvca commented 2 months ago

Do you have an index on V_6.id property?

ismaeladra commented 2 months ago

Thank you for the response.

To clarify, V_6 is a subtype of a parent type called OBJECT_INSTANCE, and the index is applied on the id field of the parent type (OBJECT_INSTANCE), not directly on V_6. The database does not allow us to create an index directly on V_6.id if an index already exists on the parent type (OBJECT_INSTANCE).

Would this structure affect the query performance? Should the index be applied differently, or is there another recommended way to optimize in this scenario? index

lvca commented 2 months ago

If you profile the query, do you see it's using that index?

Enable profiling from here:

ismaeladra commented 1 month ago

this is the profiling result

thank you very much

lvca commented 1 month ago

The index is used, but 02% of the time is spent on edge traversing. How many edges do you have between vertices?

Is E_36 an outgoing vertex from V_6 or are you interested in both incoming and outgoing?

Also, can you run this SQL query?

select expand( both("E_36") ) from V_6 where id = 647

ismaeladra commented 1 month ago

you can check in this photo time for execution is 2.3 seconds for a simple query

and this the result for the above query and this is the explain `+ FETCH FROM INDEX OBJECT_INSTANCE[id] (5,278μs) id = 803

EXTRACT VALUE FROM INDEX ENTRY (480μs) filtering buckets [228,225,222,219,216,213,210,207,204,201]
FILTER ITEMS BY TYPE (2,381μs) V_6
CALCULATE PROJECTIONS (17,051μs) both("E_36")
EXPAND (20μs)
LIMIT ( LIMIT 25)`

lvca commented 1 month ago

The math on the explanation doesn't add up for the 1,174ms of the query. Is there any way we can look into the database? Is it possible that you previously created and deleted many edges from that vertex?

ismaeladra commented 1 month ago

This is exactly the issue we're trying to track down but haven't been able to identify the root cause yet.

Regarding the created and deleted edges, our system is built on a Spring Boot application that uses an embedded ArcadeDB. Whenever users create or delete relations in our system, these changes are saved directly to ArcadeDB. However, we haven't noticed any specific patterns or issues related to previously deleted edges causing problems.

lvca commented 1 month ago

If you can't send me your database zipped to support at arcadedata.com, is there a way we can access to studio remotely and run some queries?

ismaeladra commented 1 month ago

I've just sent an email to your support team. As this is a customer environment, we can't provide direct access to the database, but we're happy to arrange a session to investigate further.