django-es / django-elasticsearch-dsl

This is a package that allows indexing of django models in elasticsearch with elasticsearch-dsl-py.
Other
1.02k stars 264 forks source link

Dynamic fields.ObjectField() to avoid mapping explosion #158

Open gagan144 opened 5 years ago

gagan144 commented 5 years ago

Hi, I am trying to index a django model (backend: postgres) having field 'attribute' of type 'django.contrib.postgres.fields.JSONField'. So, in order to do so, I created a django_elasticsearch_dsl document :

class RecordDocument(DocType): attributes = fields.ObjectField() ... def prepare_attributes(self, instance): return instance.attributes

This works fine, however, there is a mapping explosion as the structure of 'attribute' json is not fixed. Multiple records can have different/same set of keys as well as same key can have values of different datatype. For example, record1 has attributes.code=123 and record2 has attributes.code="A123".

Due to this, while indexing large records, ES throws mapping/parsing error since it dynamically assigns datatype based on the first record being index.

Is there any way or configurable parameter for 'fields.ObjectField()' or something similar which can allow diverse key-value while indexing such json data?

P.S. The attribute json structure is necessary as I will be aggregating on attributes key-value while creating reports.

bilalebi commented 4 years ago

This is a quite late reply but I hope it will be helpful for those who come across this issue.

As @safwanrahman mentioned in #133, I used the Meta class inside your document definition as explained in the documentation, I just replaced strict with false

from elasticsearch_dsl import MetaField

class MyClass(Document):

    class Meta:
        all = MetaField(enabled=False)
        dynamic = MetaField('false')

In the mapping, you'll see

{
  "index_name": {
    "mappings": {
      "doc": {
        "dynamic": "false",
...