Closed ctSkennerton closed 4 years ago
Thanks Connor for reporting this. We will be looking into this soon.
since the exception message is due to maxContentLength, as a quick work-around can you try changing https://github.com/awslabs/amazon-neptune-tools/blob/master/export-neptune-to-elasticsearch/lambda/export_neptune_to_kinesis.py#L42 to have max-content-length parameter.
command = 'df -h && wget {} && export SERVICE_REGION="{}" && java -Xms8g -Xmx8g -jar neptune-export.jar {} -e {} -p {} -d /neptune/results --output stream --stream-name {} --region {} --max-content-length 2147483647 --format neptuneStreamsJson --log-level info --use-ssl{}{}{}'.format(
That did seem to improve things but not completely. I still get the following stacktrace
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
at com.amazonaws.services.neptune.propertygraph.metadata.DataType$5.printTo(DataType.java:81)
at com.amazonaws.services.neptune.propertygraph.io.NeptuneStreamsJsonPropertyGraphPrinter.printRecord(NeptuneStreamsJsonPropertyGraphPrinter.java:133)
at com.amazonaws.services.neptune.propertygraph.io.NeptuneStreamsJsonPropertyGraphPrinter.printRecord(NeptuneStreamsJsonPropertyGraphPrinter.java:114)
at com.amazonaws.services.neptune.propertygraph.io.NeptuneStreamsJsonPropertyGraphPrinter.printProperties(NeptuneStreamsJsonPropertyGraphPrinter.java:71)
at com.amazonaws.services.neptune.propertygraph.io.NodeWriter.handle(NodeWriter.java:36)
at com.amazonaws.services.neptune.propertygraph.io.NodeWriter.handle(NodeWriter.java:18)
at com.amazonaws.services.neptune.propertygraph.io.ExportPropertyGraphTask.handle(ExportPropertyGraphTask.java:91)
at com.amazonaws.services.neptune.propertygraph.io.ExportPropertyGraphTask$CountingHandler.handle(ExportPropertyGraphTask.java:132)
at com.amazonaws.services.neptune.propertygraph.NodesClient.lambda$queryForValues$1(NodesClient.java:89)
at org.apache.tinkerpop.gremlin.process.traversal.Traversal.forEachRemaining(Traversal.java:272)
at com.amazonaws.services.neptune.propertygraph.NodesClient.queryForValues(NodesClient.java:87)
at com.amazonaws.services.neptune.propertygraph.io.ExportPropertyGraphTask.run(ExportPropertyGraphTask.java:71)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
At the end I see the following, so about half the nodes are exported.
Source:
Nodes: 319015
Edges: 739118
Export:
Nodes: 159581
Edges: 739118
Looks like type cast error while reading data from Neptune. It would help if you can provide small reproducer.
Thanks @ctSkennerton – I think I've reproduced the issue, and have pushed a fix.
I believe you may have some set cardinality properties with values of different types. The exporter was inferring the type of the values in the set based only on the first value: if this was an integer, but a subsequent value in the set was a string, the tool would raise the error you reported.
I've updated the exporter so that when it publishes these set cardinality properties, it identifies the type of each value in the set.
Note that the Neptune/Elasticsearch integration today only indexes string values – see https://docs.aws.amazon.com/neptune/latest/userguide/full-text-search-model.html – but the export part of this backfill solution will now accommodate set cardinality properties containing values of different types.
Thank you @iansrobinson that fixed my issue.
When using the neptune to elasticsearch solution I found that the elasticsearch index appeared to be missing a lot of data. Going back through the logs I see that the export neptune batch job succeeded but contained the following stacktrace
At the end of the log I can also see the following which says that most of the nodes of the graph were not exported:
Can you advise on how I could solve this issue?