django-es / django-elasticsearch-dsl

This is a package that allows indexing of django models in elasticsearch with elasticsearch-dsl-py.
Other
1.02k stars 261 forks source link

Shards/Replicas setting defaults to one when populate index command without creating the index first #478

Open ssjsk opened 6 months ago

ssjsk commented 6 months ago

We have Kubernetes based Elasticsearch 8.9.2 cluster, which I'm trying to write to using django-elasticsearch-dsl library. Following is my django model

class TestModel(models.Model):
    uuid = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    name = models.CharField(max_length=200, null=False, blank=False)

Following is my Document object defined in app/documents.py

from django_elasticsearch_dsl import Document, Index, fields
from django_elasticsearch_dsl import fields
from django_elasticsearch_dsl.registries import registry
from .models  import TestModel

@registry.register_document
class TestModelDocument(Document):
    uuid = fields.TextField()
    name = fields.TextField()

    class Index:
        name = "testmodel_index"
        settings = {
            "number_of_shards": 3, 
            "number_of_replicas": 3,
            'analysis': {
                'analyzer': {
                    'default': {
                        'type': 'default'
                    },
                'edge_ngram_analyzer': {
                        'type': 'custom',
                        'tokenizer': 'standard',
                    }
                }
            }
        }
        #52267c9d-5001-4174-b098-44f0023d6f60 0ec6d0f9-639b-4863-b018-e471cb5b31be
    class Django:
        model = TestModel  
        fields=[]

After this when I run following command on terminal, with --create option, it correctly creates index with 3 replicas and shards, but if I use --populate option directly without first creating the underlying index, number of shards and replicas setting in documents.py is ignored and set to 1 instead.

python3 manage.py search_index --create --models metadata.TestModel >>>> correctly sets 3 replica and 3 shards

curl http://es-helm-ha-elasticsearch:9200/testmodel_index/_settings?pretty
{
  "testmodel_index" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "3",
        "provided_name" : "testmodel_index",
        "creation_date" : "1710487583462",
        "analysis" : {
          "analyzer" : {
            "default" : {
              "type" : "default"
            },
            "edge_ngram_analyzer" : {
              "type" : "custom",
              "tokenizer" : "standard"
            }
          }
        },
        "number_of_replicas" : "3",
        "uuid" : "ogVjXB6kQr6fcwlqbFhb2g",
        "version" : {
          "created" : "8500008"
        }
      }
    }
  }
}
python3 manage.py search_index --populate --models metadata.TestModel  >>> sets replica and shreds to 1
curl http://es-helm-ha-elasticsearch:9200/testmodel_index/_settings?pretty
{
  "testmodel_index" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "testmodel_index",
        "creation_date" : "1710487695407",
        "number_of_replicas" : "1",
        "uuid" : "Jlj58KOiTY6cublnIgJScg",
        "version" : {
          "created" : "8500008"
        }
      }
    }
  }
}

Has anyone else encountered same issue? if anyone knows the fix, kindly share it here?