ArangoDB-Community / pyArango

Python Driver for ArangoDB with built-in validation
https://pyarango.readthedocs.io/en/latest/
Apache License 2.0
238 stars 90 forks source link

bulkImport_json errors due to encoding issue: Use body.encode('utf-8') if you want to send it encoded in UTF-8. #221

Open flmc-mikejones opened 2 years ago

flmc-mikejones commented 2 years ago

Code to reproduce:

def BulkJSONImport():
    conn = Connection(arangoURL=arango_url, username=arango_username, password=arango_password, verbose=True)
    try:
        db = conn["master"]
        print("DB is connected: ", db)
    except:
        db = conn.createDatabase(name="master")
        print("DB is created: ", db)
    try:
        collection = db["parts"]
        print("Collection is selected:", collection)
    except:
        partCollection = db.createCollection(name="parts")
        collection = db["parts"]
        print("Collection is created: ", collection)
    collection.bulkImport_json('parts.json', onDuplicate='error')
    conn.disconnectSession()

Error looks like this:

Collection is selected: ArangoDB collection name: parts, id: 12078225, type: document, status: loaded
Traceback (most recent call last):
  File "C:/projects/plural-insternal/python-sample/parts.py", line 108, in <module>
    BulkJSONImport()
  File "C:/projects/plural-insternal/python-sample/parts.py", line 45, in BulkJSONImport
===
Unable to establish connection, perhaps arango is not running.
===
    collection.bulkImport_json('parts.json', onDuplicate='error')
  File "C:\projects\plural-insternal\python-sample\venv\lib\site-packages\pyArango\collection.py", line 750, in bulkImport_json
    r = self.connection.session.post(url, params = params, data = data)
  File "C:\projects\plural-insternal\python-sample\venv\lib\site-packages\pyArango\connection.py", line 58, in __call__
    ret = self.fct(*args, **kwargs)
  File "C:\projects\plural-insternal\python-sample\venv\lib\site-packages\requests\sessions.py", line 577, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "C:\projects\plural-insternal\python-sample\venv\lib\site-packages\requests\sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\projects\plural-insternal\python-sample\venv\lib\site-packages\requests\sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "C:\projects\plural-insternal\python-sample\venv\lib\site-packages\requests\adapters.py", line 450, in send
    timeout=timeout
  File "C:\projects\plural-insternal\python-sample\venv\lib\site-packages\urllib3\connectionpool.py", line 710, in urlopen
    chunked=chunked,
  File "C:\projects\plural-insternal\python-sample\venv\lib\site-packages\urllib3\connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "C:\projects\plural-insternal\python-sample\venv\lib\site-packages\urllib3\connection.py", line 239, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "C:\Python37\lib\http\client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "C:\Python37\lib\http\client.py", line 1274, in _send_request
    body = _encode(body, 'body')
  File "C:\Python37\lib\http\client.py", line 160, in _encode
    (name.title(), data[err.start:err.end], name)) from None
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0192' in position 513842: Body ('ƒ') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

The JSON in this case can be easily imported using the Arango Web UI or the import tools. I can't find anything wrong with the JSON file.