ArcadeData / arcadedb

ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.
https://arcadedb.com
Apache License 2.0
473 stars 57 forks source link

Quotes are stripped out of arrays, possibly only in GraphML imports. #122

Closed tetious closed 2 years ago

tetious commented 2 years ago

ArcadeDB Version: 21.10.2-SNAPSHOT (build ae608bfe8ea4d5279afefe08d42d8f4f9f3d2292/1633588344745/main)

OS: docker

For this query: MATCH (m:Movie{title:'The Polar Express'})<-[e:ACTED_IN]-(v:Person) RETURN e, v

Expected behavior

The edge property 'roles' should be a properly formed array, as is shown here from Neo4J's view of the import: Screen Shot 2021-10-10 at 20 31 13

Actual behavior

The quotes are missing, making the array unparsable: "roles": "[Hero Boy, Father, Conductor, Hobo, Scrooge, Santa Claus]"

I've confirmed arrays can be created and returned successfully via parameterized inserts, so I suspect this is a bug in the import code. Could be wrong, though!

lvca commented 2 years ago

@antarcticgiraffe could you please take this when you can? Thanks!

lvca commented 2 years ago

@tetious I tried with the latest release and it works:

This is the edge returned in Studio:

{
   "edges": [
      {
        "p": {
          "roles": [
            "Hero Boy",
            "Father",
            "Conductor",
            "Hobo",
            "Scrooge",
            "Santa Claus"
          ]
        },
        "r": "#54:20",
        "t": "ACTED_IN",
        "i": "#13:4",
        "o": "#34:7"
      }
    ]
}

Are you sure you're using the latest version? Try to remove the latest image from docker and run it again, so docker should download the latest.

tetious commented 2 years ago

I'm pretty sure I am. I just updated again, reimported like this:

import database https://github.com/ArcadeData/arcadedb-datasets/raw/main/neo4j/movies.graphml.tgz

And it is still a problem for me: (see version in the results)

    "edges": [
      {
        "p": {
          "roles": "[Hero Boy, Father, Conductor, Hobo, Scrooge, Santa Claus]"
        },
        "r": "#55:16",
        "t": "ACTED_IN",
        "i": "#1:3",
        "o": "#46:0"
      },
      {
        "p": {
          "roles": "[Hero Boy, Father, Conductor, Hobo, Scrooge, Santa Claus]"
        },
        "r": "#55:16",
        "t": "ACTED_IN",
        "i": "#1:3",
        "o": "#46:0"
      }
    ]
  },
  "user": "root",
  "version": "21.10.2-SNAPSHOT (build 2a5c3292c37bc9ba048e79e66956f178a0ecf1d5/1634079671040/main)"
}

Maybe we're using different datasets?

lvca commented 2 years ago

@tetious you're right, if I import the database from scratch I have the same issue. Checking into it. Thanks for the report.

lvca commented 2 years ago

Ok, seems the database exported and published has this issue: arrays are saved as strings (I suspect the Apache TinkerPop GraphML exporter has a bug where arrays are exported with a .toString().

lvca commented 2 years ago

Quick update. Written to the Discord channel of Gremlin. From the code, arrays/list/collections are saved as strings. The issue is that the GraphML Importer in Gremlin is unable to restore the original graph and the property is read as a string, confirming what you're experimenting with.

lvca commented 2 years ago

Fixed in the main branch by supporting GraphSON format that supports complex types, like arrays/collections/lists. We also have published this Movie database where you can import:

import database https://github.com/ArcadeData/arcadedb-datasets/raw/main/neo4j/movies.graphson.tgz
tetious commented 2 years ago

Awesome! Thanks! I'll have a play once CD catches up.