elastic / elasticsearch-py

Official Python client for Elasticsearch
https://ela.st/es-python
Apache License 2.0
4.24k stars 1.18k forks source link

Unicode keys of dictionary partially fail #66

Closed Diolor closed 10 years ago

Diolor commented 10 years ago

Hi,

I found (in a very painful way:) ) that if there are unicode keys on a inner dictionary ES or the python library fails to add the object. e.g. for the following dictionary:

{
    u'city': u'Toronto',
    u'name': u'PostBeyond',
    u'events': {
        u'title': u'ExtremeCachingwithPHP',
        u'start_date': u'2014-01-08T00: 00: 00+00: 00'
    }
}

ES will process/add correctly city and name fields. However the inner events.title and events.start_date will not be added.

This will be correctly processed:

{
    'city': u'Toronto',
    'name': u'PostBeyond',
    'events': {
        'title': u'ExtremeCachingwithPHP',
        'start_date': u'2014-01-08T00: 00: 00+00: 00'
    }
}

As a workaround I tried the following which for some reason does not work. I guess the 'events' is still somehow in the memory.

temp_events = {str(key): val for key, val in doc['events'].items()} #make unicode keys strings
doc.pop('events',None)
doc['events'] = temp_events  #this will not be added either
doc['events2'] = temp_events #this will be added

Anyway it's 6am

Best, D

honzakral commented 10 years ago

Hi Diolor,

I cannot replicate your issue:

>>> from elasticsearch import Elasticsearch
>>> es = Elasticsearch()
>>> es.index(index='i', doc_type='t', id=42, body={
...    u'city': u'Toronto',
...    u'name': u'PostBeyond',
...    u'events': {
...        u'title': u'ExtremeCachingwithPHP',
...        u'start_date': u'2014-01-08T00: 00: 00+00: 00'
...    }
... })
{u'_id': u'42', u'_index': u'i', u'_type': u't', u'_version': 1, u'created': False}
>>> es.get(index='i', doc_type='t', id=42)
{u'_id': u'42', u'_index': u'i', u'_source': {u'city': u'Toronto',
  u'events': {u'start_date': u'2014-01-08T00: 00: 00+00: 00', u'title': u'ExtremeCachingwithPHP'},
  u'name': u'PostBeyond'},
u'_type': u't', u'_version': 1, u'found': True}

Could it maybe be caused by the misspelled date in your example?

Honza

Diolor commented 10 years ago

Hi Honza,

Can you replicate this?

from elasticsearch import Elasticsearch, helpers

es = Elasticsearch()
actions = []

action = {
    '_type': 't',
    '_id': '52cb45cec36b4442751728f5',
    '_source': {
        u'city': u'Toronto',
        u'name': u'PostBeyond',
        u'_id': {
            '$oid': '52cb45cdc36b4442751728f4'
        },
        u'events': {
            u'title': u'ExtremeCachingwithPHP',
            u'event_id': {
                '$oid': '52cb45cec36b4442751728f5'
            },
            u'start_date': u'2014-01-08T00:00:00+00:00'
        }
    },
    '_index': 'i'
}

actions.append(action)
helpers.bulk(es, actions)

from elasticsearch.client import IndicesClient

ic = IndicesClient(es)
ic.get_mapping(index='i',doc_type='t')

The last line gives me :

>>> ic.get_mapping(index='i',doc_type='t')
{u'i': {u'mappings': {u't': {u'properties': {u'$oid': {u'type': u'string'}, u'city': {u'type': u'string'}}}}}}

The conflict is with the second _id inside the _source. If the action does't have a second id :

action = {
    '_type': 't',
    '_id': '52cb45cec36b4442751728f5',
    '_source': {
        u'city': u'Toronto',
        u'name': u'PostBeyond',
        u'events': {
            u'title': u'ExtremeCachingwithPHP',
            u'event_id': {
                '$oid': '52cb45cec36b4442751728f5'
            },
            u'start_date': u'2014-01-08T00:00:00+00:00'
        }
    },
    '_index': 'i'
}

The mapping is correct:

>>> ic.get_mapping(index='i',doc_type='t')
{u'i': {u'mappings': {u't': {u'properties': {u'$oid': {u'type': u'string'}, u'city': {u'type': u'string'}, 
u'events': {u'properties': {u'event_id': {u'properties': {u'$oid': {u'type': u'string'}}}, u'start_date': 
{u'type': u'date', u'format': u'dateOptionalTime'}, u'title': {u'type': u'string'}}}, u'name': {u'type': 
u'string'}}}}}}

Apparently I this is not python client's problem. ES searches for a _id field[1]. Still wondering if I can have a _id field inside the _source different than the ES's doc id. I should better address it to the main ES community.

Best, D

[1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-id-field.html#mapping-id-field

honzakral commented 10 years ago

Yes, the _id field has to be a value, not another object. The correct way to handle this is to transform your document before handing it off to bulk or use the expand_action_callback to do it from within.

As this issue is not python related I am closing the ticket, please feel free to open a new one for any issue you find. Thanks