Closed bramson closed 5 years ago
@bramson Are you running Python2 or Python3? If not Python3, can you try it with Python3? Also, can you share a snippet of the data?
I'm running Python3.
On the path to sharing a snippet of data, I found that the file reader gave the same error at the same location (322) unless I specified the encoding as 'utf-8' (which of course it would be). This error indicates that graphml2csv.py
is using a different encoding (cp932). Is there a way to tell the program to use 'utf-8' instead?
One node looks like this, although most nodes have MUCH longer polygon sequences:
`['<?xml version="1.0" encoding="UTF-8"?>\n', '<graphml xmlns="http://graphml.graphdrawing.org/xmlns" ...
@bramson I wasn't able to reproduce this one locally, but my hypothesis is that your source file is actually encoded in cp932 and the script is interpreting it as utf-8, which is causing the error.
Here is a branch that adds the ability to specify the input file encoding: https://github.com/awslabs/amazon-neptune-tools/tree/issue13/graphml2csv.
Can you try this with the -e cp932
and see if it resolves the error?
Closing for now. Please re-open if this is still an issue.
In my Neo4j database I have nodes with GIS coordinates stored as a property. These are long lists of pairs of ~13 digit numbers separated by commas. Neo4j stores these as strings, I believe in UTF-8 format.
I used the APOC command to export the graph in graphml format, and now I'm trying to convert it to CSV for upload into Neptune using this script. I get an 'illegal multibyte sequence' error
UnicodeDecodeError('cp932', b'5.4397040833, 133.328841248 35.439539551, 133.32875060...3757508892 35', 332, 333, 'illegal multibyte sequence')
A couple of weird things occur at the end... (1) the single quote after "35" and (2) the "332, 333" which are not in the data. So I guess those are error codes (but I really have no idea). My first guess is that the real problem is that the list is just too long (because it's occurring at numbers instead of the Asian script text, but I really don't know),
Any information on what could be generating this error and how to avoid it?