ArcadeData / arcadedb

ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.
https://arcadedb.com
Apache License 2.0
469 stars 57 forks source link

Edge's aren't being deleted with vertices for large enough datasets. #1626

Closed WSaffery closed 4 weeks ago

WSaffery commented 1 month ago

ArcadeDB Version:

ArcadeDB Server v24.5.1

OS and JDK Version:

Running on Linux 6.8.10-300.fc40.x86_64 - OpenJDK 64-Bit Server VM 17.0.11 ((Red_Hat-17.0.11.0.9-1))

Expected behavior

Dropping vertices should always remove all edges.

Actual behavior

While I've confirmed that generally dropping a vertex drops its edges. For the large dataset I am currently testing, dropping all vertices in one query fails to remove all edges.

I've tested this in both gremlin, and in SQL using a node supertype.

g.V().drop()

and

delete from `node`

The gremlin code has been tested via remote traversal and the web console, the SQL only via the console.

Steps to reproduce

Load the attached database backup, attached here, and then run either g.V().drop() or delete from node

You may need to change your evaluationTimeout in gremlin-server.yaml to run g.V().drop() without timing out. My last run in gremlin took 37454ms and my evaluationTimeout is at 120000 (ms).

Apologies I have been unable to easily determine a smaller subset of the database for which the problem persists. I did trim down my original dataset to just include the edge isLocatedIn to make the backup smaller.

My gremlin-server.yaml for posterity

host: 0.0.0.0
evaluationTimeout: 120000
graphs:
  graph: ./config/gremlin-server.properties
scriptEngines:
  gremlin-groovy:
    plugins:
      org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {}
      org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin:
        classImports:
          - java.lang.Math
          - org.opencypher.gremlin.traversal.CustomFunctions
          - org.opencypher.gremlin.traversal.CustomPredicate
        methodImports:
          - "java.lang.Math#*"
          - "org.opencypher.gremlin.traversal.CustomPredicate#*"
          - "org.opencypher.gremlin.traversal.CustomFunctions#*"
      org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin:
        files:
          - ./config/gremlin-server.groovy
serializers:
  - className: org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1
    config:
      ioRegistries:
        - com.arcadedb.gremlin.io.ArcadeIoRegistry
  - className: org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV3
    config:
      ioRegistries:
        - com.arcadedb.gremlin.io.ArcadeIoRegistry
  - className: org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV2
    config:
      ioRegistries:
        - com.arcadedb.gremlin.io.ArcadeIoRegistry
processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 } }
lvca commented 1 month ago

How did you test the edges were still in the database?

WSaffery commented 1 month ago

Running g.E() will throw an exception, as it tries to return edges with endpoint's that don't exist. I then verified the edges existed on the database section of the web app. Also running g.E().id() doesn't cause an exception, the exception is only caused by trying to convert the edge to a string, which then tries to dereferences its endpoints to convert them to strings too.

lvca commented 1 month ago

Ok, then it looks like there is an issue with the delete. I'm trying to reproduce it by using your dataset.

lvca commented 4 weeks ago

I was able to reproduce the issue. After the deletion of all the vertices, there are 23 edges left in the database. I also fixed a bug in the HTTP serializer used by Studio that prevents returning the results when the connected record.

image

Checking now if these edges were already disconnected before the delete of vertices.

lvca commented 4 weeks ago

Ok, found the issue. You were right, the delete skipped some edges. This was due to the iterator from the edge segment (linked list) used for deletion that wasn't going through the whole list. Checking for regressions and if everything passes, I'll push the fix.

lvca commented 4 weeks ago

Fixed. Thanks for reporting the issue with the dataset, very quick to reproduce it.

WSaffery commented 4 weeks ago

Great to hear, thanks for the fast fix.