ArcadeData / arcadedb

ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.
https://arcadedb.com
Apache License 2.0
495 stars 60 forks source link

Unable to import JSON data #397

Closed lnoir closed 2 years ago

lnoir commented 2 years ago

ArcadeDB Version: v22.1.3-SNAPSHOT (DOCKER)

JDK Version: N/A

OS: N/A

Expected behavior

Import vertices from a JSON file.

Actual behavior

Error occurs when trying use import database ...:

Error on command execution (PostCommandHandler)
com.arcadedb.exception.QueryParsingException: Error on executing Cypher query
        at com.arcadedb.gremlin.query.CypherQueryEngine.command(CypherQueryEngine.java:69)
        at com.arcadedb.database.EmbeddedDatabase.command(EmbeddedDatabase.java:1185)
        at com.arcadedb.server.ServerDatabase.command(ServerDatabase.java:395)
        at com.arcadedb.server.http.handler.PostCommandHandler.executeCommand(PostCommandHandler.java:238)
        at com.arcadedb.server.http.handler.PostCommandHandler.execute(PostCommandHandler.java:83)
        at com.arcadedb.server.http.handler.DatabaseAbstractHandler.execute(DatabaseAbstractHandler.java:82)
        at com.arcadedb.server.http.handler.AbstractHandler.handleRequest(AbstractHandler.java:108)
        at io.undertow.server.RoutingHandler.handleRequest(RoutingHandler.java:93)
        at io.undertow.server.handlers.PathHandler.handleRequest(PathHandler.java:104)
        at io.undertow.server.Connectors.executeRootHandler(Connectors.java:387)
        at io.undertow.server.protocol.http.HttpReadListener.handleEventWithNoRunningRequest(HttpReadListener.java:256)
        at io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:136)
        at io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:59)
        at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
        at org.xnio.conduits.ReadReadyHandler$ChannelListenerHandler.readReady(ReadReadyHandler.java:66)
        at org.xnio.nio.NioSocketConduit.handleReady(NioSocketConduit.java:89)
        at org.xnio.nio.WorkerThread.run(WorkerThread.java:591)
Caused by: com.arcadedb.exception.QueryParsingException: org.opencypher.v9_0.util.SyntaxException: Invalid input 'i': expected <init> (line 1, column 1 (offset: 0))
        at org.apache.tinkerpop.gremlin.arcadedb.structure.ArcadeGraph.cypher(ArcadeGraph.java:139)
        at com.arcadedb.gremlin.query.CypherQueryEngine.command(CypherQueryEngine.java:64)
        ... 16 more

Steps to reproduce

Sample data in vertices.json file, mounted as a docker volume at start-up of the container with -v "$(pwd)/vertices.json":/tmp/vertices.json.

{"id":"1","type":"node","labels":["User"],"properties":{"id":"e274a92e-f309-4d1b-a888-fd9e1e5a5ccb"}}

Run import database file://tmp/vertices.json. The above error occurs. The docs state that Neo4J export format is supported. I've tried JSONL and ARRAY JSON, using both the above method and trying to import through the ArcadeDB web UI.

Please let me know what the correct format is, or if this is a bug what is the work around. Thanks.

EDIT: typo

lvca commented 2 years ago

This is the minimal json we're using in our automatic tests: https://github.com/ArcadeData/arcadedb/blob/59dc7789dc25512a67bc49eb8d9f3b9884af7f2b/integration/src/test/resources/neo4j-export-mini.jsonl#L5-L4. How is your file different in the format from this?

lnoir commented 2 years ago

Hi Luca, the sample I posted is literally the same data I tried to import that triggered the error.

I originally tried a larger file, but on encountering the error I tried to import very trivial file without success. The file format appears to match that of the JSON in your tests.

lvca commented 2 years ago

So it must be in the way the importer is called. This is the test able to import Neo4j files: https://github.com/ArcadeData/arcadedb/blob/0fb02e2a1a130019d62275aeb9bfba8e29704e3c/integration/src/test/java/com/arcadedb/integration/importer/Neo4jImporterIT.java.

Can you spot any difference in the way you are using the importer?

lnoir commented 2 years ago

I'm simply running the import database file:///tmp/vertices.json command in the ArcadeDB web UI. But the error I'm getting is different from the one originally reported. I think with all the back and forth and tinkering around trying to resolve it the other day, I mixed up the errors.

This is what I'm getting every time I try to call import (Error on parsing source 'null'):

2022-05-11 16:35:55.652 INFO  [SourceDiscovery] <ArcadeDB_0> Analyzing url: file:///tmp/vertices.json...- Status update: parsed 0 (0/sec) - 0 documents (0/sec) - 0 vertices (0/sec) - 0 edges (0/sec) - 0 skipped edges - 0 linked edges (0/sec - 0%)
<ArcadeDB_0> Error on command execution (PostCommandHandler)
com.arcadedb.exception.CommandExecutionException: Error on importing database
        at com.arcadedb.query.sql.parser.ImportDatabaseStatement.executeSimple(ImportDatabaseStatement.java:59)
        at com.arcadedb.query.sql.executor.SingleOpExecutionPlan.executeInternal(SingleOpExecutionPlan.java:115)
        at com.arcadedb.query.sql.executor.ScriptLineStep.syncPull(ScriptLineStep.java:50)
        at com.arcadedb.query.sql.executor.ScriptExecutionPlan.doExecute(ScriptExecutionPlan.java:97)
        at com.arcadedb.query.sql.executor.ScriptExecutionPlan.fetchNext(ScriptExecutionPlan.java:57)
        at com.arcadedb.query.sql.parser.LocalResultSet.fetchNext(LocalResultSet.java:45)
        at com.arcadedb.query.sql.parser.LocalResultSet.<init>(LocalResultSet.java:39)
        at com.arcadedb.database.EmbeddedDatabase.executeInternal(EmbeddedDatabase.java:1249)
        at com.arcadedb.database.EmbeddedDatabase.execute(EmbeddedDatabase.java:1197)
        at com.arcadedb.server.ServerDatabase.execute(ServerDatabase.java:400)
        at com.arcadedb.server.http.handler.PostCommandHandler.executeScript(PostCommandHandler.java:229)
        at com.arcadedb.server.http.handler.PostCommandHandler.execute(PostCommandHandler.java:82)
        at com.arcadedb.server.http.handler.DatabaseAbstractHandler.execute(DatabaseAbstractHandler.java:82)
        at com.arcadedb.server.http.handler.AbstractHandler.handleRequest(AbstractHandler.java:108)
        at io.undertow.server.RoutingHandler.handleRequest(RoutingHandler.java:93)
        at io.undertow.server.handlers.PathHandler.handleRequest(PathHandler.java:104)
        at io.undertow.server.Connectors.executeRootHandler(Connectors.java:387)
        at io.undertow.server.protocol.http.HttpReadListener.handleEventWithNoRunningRequest(HttpReadListener.java:256)
        at io.undertow.server.protocol.http.HttpReadListener.handleEvent(HttpReadListener.java:136)
        at io.undertow.server.protocol.http.HttpOpenListener.handleEvent(HttpOpenListener.java:162)
        at io.undertow.server.protocol.http.HttpOpenListener.handleEvent(HttpOpenListener.java:100)
        at io.undertow.server.protocol.http.HttpOpenListener.handleEvent(HttpOpenListener.java:57)
        at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
        at org.xnio.ChannelListeners$10.handleEvent(ChannelListeners.java:291)
        at org.xnio.ChannelListeners$10.handleEvent(ChannelListeners.java:286)
        at org.xnio.ChannelListeners.invokeChannelListener(ChannelListeners.java:92)
        at org.xnio.nio.QueuedNioTcpServer2.acceptTask(QueuedNioTcpServer2.java:178)
        at org.xnio.nio.WorkerThread.safeRun(WorkerThread.java:612)
        at org.xnio.nio.WorkerThread.run(WorkerThread.java:479)
Caused by: com.arcadedb.integration.importer.ImportException: Error on parsing source 'null'
        at com.arcadedb.integration.importer.Importer.load(Importer.java:59)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.arcadedb.query.sql.parser.ImportDatabaseStatement.executeSimple(ImportDatabaseStatement.java:54)
        ... 28 more
Caused by: java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:342)
        at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:292)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at com.arcadedb.integration.importer.Parser$1.read(Parser.java:115)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/sun.nio.cs.StreamDecoder.read0(StreamDecoder.java:127)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:112)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:164)
        at com.arcadedb.integration.importer.Parser.nextChar(Parser.java:51)
        at com.arcadedb.integration.importer.SourceDiscovery.analyzeSourceContent(SourceDiscovery.java:287)
        at com.arcadedb.integration.importer.SourceDiscovery.getSchema(SourceDiscovery.java:57)
        at com.arcadedb.integration.importer.Importer.loadFromSource(Importer.java:75)
        at com.arcadedb.integration.importer.Importer.load(Importer.java:50)
        ... 33 more

I've checked the file has the correct data, tried using array JSON rather than JSONL, but the result is the same every time.

I did also try uploading the file using the "Upload" button in the UI, but that breaks because of a JS error (globalGraphSettings is undefined).

lnoir commented 2 years ago

I tried logging into the docker container and using ./bin/console.sh to run import database ... and here encountered no errors, but nothing is imported.

2022-05-12 03:30:20.487 INFO  [SourceDiscovery] Analyzing url: file:///tmp/arcade.json...
2022-05-12 03:30:20.497 INFO  [SourceDiscovery] Recognized format JSON (parsingLimitBytes=9.54MB parsingLimitEntries=0)Status update: parsed 0 (0/sec - 0%) - 0 records (0/sec) - 0 vertices (0/sec) - 0 edges (0/sec) - 0 skipped edges - 0 linked edges (0/sec - 0%)

+---------+-----------------------+
|NAME     |VALUE                  |
+---------+-----------------------+
|operation|import database        |
|fromUrl  |file:///tmp/arcade.json|
|result   |OK                     |
+---------+-----------------------+Command executed in 90ms

I tried with the one-line JSONL file above and with a file containing both vertices and edges. Nothing gets imported, but there are no errors.

lnoir commented 2 years ago

Something else I just tried: using the JSON from here. I created a file at /tmp/test.jsonl and add the JSON from the link. The results of import database file:///tmp/test.jsonl is below:

2022-05-12 05:41:11.541 INFO  [SourceDiscovery] Analyzing url: file:///tmp/test.jsonl...
2022-05-12 05:41:11.551 INFO  [SourceDiscovery] Recognized format JSON (parsingLimitBytes=9.54MB parsingLimitEntries=0)Status update: parsed 0 (0/sec - 0%) - 0 records (0/sec) - 0 vertices (0/sec) - 0 edges (0/sec) - 0 skipped edges - 0 linked edges (0/sec - 0%)

+---------+----------------------+
|NAME     |VALUE                 |
+---------+----------------------+
|operation|import database       |
|fromUrl  |file:///tmp/test.jsonl|
|result   |OK                    |
+---------+----------------------+Command executed in 87ms

The seems like it could be a bug, unless you can spot something I'm doing wrong here. Could you try doing the same?

lnoir commented 2 years ago

I haven't made any further progress with this issue and for now will be going with another DB, so have closed this.

lzm0 commented 2 years ago

I am experiencing the same issue as of 22.6.1 Mind if you reopen this issue? @lnoir

lvca commented 2 years ago

@lzm0 can you share the database to import so we can test it locally? You can also send it to support@arcadedb.com with the issue number in the subject.

lzm0 commented 2 years ago

@lzm0 can you share the database to import so we can test it locally? You can also send it to support@arcadedb.com with the issue number in the subject.

I tried to import the jsonl you posted https://github.com/ArcadeData/arcadedb/blob/59dc7789dc25512a67bc49eb8d9f3b9884af7f2b/integration/src/test/resources/neo4j-export-mini.jsonl

> import database file:///home/arcadedb/neo4j-export-mini.jsonl

2022-07-26 13:50:05.604 INFO  [SourceDiscovery] Analyzing url: file:///home/arcadedb/neo4j-export-mini.jsonl...
2022-07-26 13:50:05.625 INFO  [SourceDiscovery] Recognized format JSON (parsingLimitBytes=9.54MB parsingLimitEntries=0)Status update: parsed 0 (0/sec - 0%) - 0 records (0/sec) - 0 vertices (0/sec) - 0 edges (0/sec) - 0 skipped edges - 0 linked edges (0/sec - 0%)

+---------+---------------------------------------------+
|NAME     |VALUE                                        |
+---------+---------------------------------------------+
|operation|import database                              |
|fromUrl  |file:///home/arcadedb/neo4j-export-mini.jsonl|
|result   |OK                                           |
+---------+---------------------------------------------+Command executed in 203ms
lvca commented 2 years ago

I was able to reproduce it. The issue is on recognizing it's a Neo4j file, not just JSON. By using directly the Neo4j importer works:

ArcadeDB 22.6.2-SNAPSHOT (build //) - Neo4j Importer
Importing Neo4j database from file '/Users/luca/Documents/GitHub/arcadedb/integration/target/test-classes/neo4j-export-mini.jsonl' to 'target/databases/neo4j'
Creation of the schema: types, properties and indexes
- Creation of vertices started
- Creation of vertices completed: created 3 vertices, skipped 1 edges (0 vertices/sec elapsed=0 secs)
Creation of edges started: creating edges between vertices
- Creation of edged completed: created 1 edges, (0 edges/sec elapsed=0 secs)
***************************************************************************************************
Import of Neo4j database completed in 0 secs with 0 errors and 0 warnings.

SUMMARY

- Vertices.............: 3
-- User                : 3 
- Edges................: 1
-- KNOWS               : 1 
- Total attributes.....: 11
***************************************************************************************************

NOTES:
- you can find your new ArcadeDB database in 'target/databases/neo4j'

Working on this.

lvca commented 2 years ago

Fixed in 068cf00d. It will be part of 22.7.1 release we're going to release this week. You can use it right now by updating your local git repository and run mvn clean install -DskipTests. You'll find the new distribution directory under "package/target".