denshoproject / ddr-cmdln

Command-line tools for automating the Densho Digital Repository's various processes.
Other
0 stars 2 forks source link

ddrindex publish does not work with segments #71

Closed GeoffFroh closed 6 years ago

GeoffFroh commented 6 years ago

ddrindex publish throws an exception when attempting to index entity.json files that represent segments. Appears that it may be because the function is trying to post the data to ES as "_type": "entity" instead of "_type":"segment". See the bottom of the stack trace:

2018-05-05 07:56:31.976102-07:00 | 82/120 POST ddr-densho-1000-113-37 
WARNING:elasticsearch:GET /ddrpublic-production/entity/ddr-densho-1000-113-37 [status:404 request:0.010s]
Traceback (most recent call last):
  File "/opt/ddr-local/venv/ddrlocal/bin/ddrindex", line 11, in <module>
    load_entry_point('ddr-cmdln==0.9.4b0', 'console_scripts', 'ddrindex')()
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/click-6.7-py2.7.egg/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/click-6.7-py2.7.egg/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/click-6.7-py2.7.egg/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/click-6.7-py2.7.egg/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/click-6.7-py2.7.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/ddr_cmdln-0.9.4b0-py2.7.egg/DDR/cli/ddrindex.py", line 287, in publish
    status = docstore.Docstore(hosts, index).post_multi(path, recursive=recurse, force=force)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/ddr_cmdln-0.9.4b0-py2.7.egg/DDR/docstore.py", line 663, in post_multi
    d = self.get(oi.model, oi.id)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/ddr_cmdln-0.9.4b0-py2.7.egg/DDR/docstore.py", line 718, in get
    return ES_Class.get(document_id, using=self.es, index=self.indexname)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/elasticsearch_dsl/document.py", line 154, in get
    **kwargs
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 341, in get
    doc_type, id), params=params)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 327, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 110, in perform_request
    self._raise_error(response.status, raw_data)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 114, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: TransportError(404, u'{"_index":"ddrpublic-20170223-2","_type":"entity","_id":"ddr-densho-1000-113-37","found":false}')
GeoffFroh commented 6 years ago

(Confirmed the VM on maunakea is running latest code. ddr-cmdln on master at 9e36641706d3364722951fc28b539ef9924d1519. Installed with ddr-local on master at aaf23e71c9075ff3b8be14180b1750f9bde4f45a; ddr-defs on master at 0fe378918b7b36fadaa9733460c4bd5a4b0ea330)

gjost commented 6 years ago

What command was used to produce this error? I'm unable to duplicate it. Also, the ddr-densho-1000 i have here has ~35K objects not 120?

GeoffFroh commented 6 years ago

ddrindex publish -r /path/to/ddr-densho-1000/files/ddr-densho-1000-113

and I've now seen the same behavior with:

ddrindex publish -r /path/to/ddr-densho-1000/files/ddr-densho-1000-362