denshoproject / ddr-cmdln

Command-line tools for automating the Densho Digital Repository's various processes.
Other
0 stars 2 forks source link

Error when indexing to the green cluster #236

Closed gjost closed 9 months ago

gjost commented 9 months ago

Looks like a mapping error?

(cmdln) ddr@maunakea:/media/qnfs/kinkura/gold/ddr-densho-359$ ddrindex publish --recurse --b2 --hosts 192.168.0.20:9200 /media/qnfs/kinkura/gold/ddr-densho-292
File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch/connection/base.py", line 328, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', "failed to parse field [persons] of type [keyword] in doc
ument with id 'ddr-densho-292-62'. Preview of field's value: '{nr_id=88922/nr008k20z, namepart=Nishioka, Frank Koji}'")
ERROR: not created
2023-09-15 15:08:53.555585-07:00 | 246/315 POST ddr-densho-292-2
Traceback (most recent call last):
  File "/opt/ddr-cmdln/venv/cmdln/bin/ddrindex", line 33, in <module>
    sys.exit(load_entry_point('ddr-cmdln==5.8.1', 'console_scripts', 'ddrindex')())
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/click-8.1.3-py3.9.egg/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/click-8.1.3-py3.9.egg/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/click-8.1.3-py3.9.egg/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/click-8.1.3-py3.9.egg/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/click-8.1.3-py3.9.egg/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/ddr_cmdln-5.8.1-py3.9.egg/DDR/cli/ddrindex.py", line 355, in publish
    status = ds.post_multi(
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/ddr_cmdln-5.8.1-py3.9.egg/DDR/docstore.py", line 554, in post_multi
    d = self.get(
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elastictools/docstore.py", line 163, in get
    return es_class.get(
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch_dsl/document.py", line 206, in get
    return cls.from_es(doc)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch_dsl/utils.py", line 468, in from_es
    doc._from_dict(data)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch_dsl/utils.py", line 475, in _from_dict
    v = f.deserialize(v)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch_dsl/field.py", line 113, in deserialize
    data = [None if d is None else self._deserialize(d) for d in data]
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch_dsl/field.py", line 113, in <listcomp>
    data = [None if d is None else self._deserialize(d) for d in data]
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch_dsl/field.py", line 216, in _deserialize
    return self._wrap(data)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch_dsl/field.py", line 193, in _wrap
    return self._doc_class.from_es(data, data_only=True)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch_dsl/document.py", line 121, in from_es
    return super(InnerDoc, cls).from_es(data)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch_dsl/utils.py", line 468, in from_es
    doc._from_dict(data)
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/elasticsearch_dsl/utils.py", line 472, in _from_dict
    for k, v in iteritems(data):
  File "/opt/ddr-cmdln/venv/cmdln/lib/python3.9/site-packages/six.py", line 605, in iteritems
    return iter(d.items(**kw))
AttributeError: 'str' object has no attribute 'items'
gjost commented 9 months ago

I downloaded the mappings from both the elasticblue and elasticgreen clusters and used a diff tool called Meld (https://meldmerge.org) to compare them. The DDR cobjects in elasticblue have the nr_id fields for creators and persons but elasticgreen does not.

Elasticsearch doesn't let us modify index mappings once they've been created and data has been published. We're going to need to delete the elasticgreen indexes and reindex everything. Sorry to be the bearer of bad news. :(

I'm working on changing the narrator ID from id to oh_id, so it'd probably be best to reindex when that's done so we don't have to reindex yet again.

GeoffFroh commented 9 months ago

Do you think it might be possible to use ES's Clone API instead of complete reindexing?

(see: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-clone-index.html)

GeoffFroh commented 9 months ago

The plan is to try the Clone API to the black cluster as a test

GeoffFroh commented 9 months ago

See related: https://github.com/denshoproject/ddr-local/issues/320

gjost commented 9 months ago

This is now a non-issue: elasticgreen was replaced with the indices from elasticblue