bennylope / elasticstack

:card_index: Configurable indexing and other extras for Haystack (with ElasticSearch biases)
BSD 2-Clause "Simplified" License
127 stars 31 forks source link

Add multiple index support #17

Closed martinsvoboda closed 8 years ago

martinsvoboda commented 8 years ago

I would like participate on adding multiple index support. I run multilang django application, each language should have separate language index settings in elasticsearch. Nowadays Elasticstack not support this architecture. My proposition is to change settings structure (see example), but changes are uncompatible with today settings concept. I would like to discuss new concept, than I can implement change and make regular pull request.

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'citaty.core.haystack.ConfigurableElasticSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'default',
        'SETTINGS_NAME': 'cs',
        'DEFAULT_ANALYZER': 'cs',
    },
    'default_cs': {
        'ENGINE': 'citaty.core.haystack.ConfigurableElasticSearchEngine',
        'URL': 'http://127.0.0.1:9200/',
        'INDEX_NAME': 'default_cs',
        'SETTINGS_NAME': 'cs',
        'DEFAULT_ANALYZER': 'cs',
    },
    # ...
}

ELASTICSEARCH_INDEX_SETTINGS = {
    'cs': {
        "settings": {
            "analysis": {
                "analyzer": {
                    "default": {
                        "type": "custom",
                        "tokenizer": "standard",
                        "filter": ["stopwords_CZ", "lowercase", "hunspell_CZ", "stopwords_CZ", "remove_duplicities"]
                    }
                },
                "filter": {
                    "stopwords_CZ": {
                        "type": "stop",
                        "stopwords": ["právě", "že", "test", "_czech_"],
                        "ignore_case": True
                    },
                    "hunspell_CZ": {
                        "type": "hunspell",
                        "locale": "cs_CZ",
                        "dedup": True,
                        "recursion_level": 0
                    },
                    "remove_duplicities": {
                        "type": "unique",
                        "only_on_same_position": True
                    },
                }
            }
        }
    },
    'es': {
        "settings": {
            "analysis": {
                "analyzer": {
                    "default": {
                        "type": "spanish",
                    }
                },
            }
        }
    },
    # ...
}
bennylope commented 8 years ago

This sounds great! The only thing I'd strongly recommend, or even request, is backwards compatibility. Perhaps the default would be to look for a "settings" key in the ELASTICSEARCH_INDEX_SETTINGS dictionary when populating settings for the backend, and if it can't find that then it assumes that the keys represent index-specific settings. What do you think about that?

Then your language-specific settings could be documented as the primary way forward and secondary documentation for the 'old' default language.

Presumably this has uses beyond multi-lingual search clusters, too, so while I don't have any immediate uses cases myself I can see this being very helpful. I'll leave testing it out to you, too :)

bennylope commented 8 years ago

Closed by #19 and in release 0.4.0, PyPI