ChristopherRabotin / bungiesearch

UNMAINTAINED CODE -- Elasticsearch-dsl-py django wrapper with mapping generator
BSD 3-Clause "New" or "Revised" License
67 stars 20 forks source link

Some bugs & improvements #115

Open marcinn opened 9 years ago

marcinn commented 9 years ago

Hi, I have some thougts about bungiesearch. Please forgive me listing all in one issue. I just want to share my thougts.

Doctype should be set in ModelIndex meta.

Why? Currently, as a developer, I have no idea what name of Doctype is used. It's created automatically in mgmt command: mapping[mdl_idg.get_model().__name__] = mdl_idx.get_mapping().

But when I need to query ES without mapping to Django I must use custom_search() and provide index name together with Doctype. Index name is well known, but Doctype not.

After looking into code I know that doctype is a simply class name, but this is non-intuitive. As a Django developer I'm expecting name generated as <app_label>.<model_name>.

Doctype name may conflict with other models

You're using Model's class name as a Doctype name. This is not a best solution. Class (models) names may be same for different models provided by different apps. Please consider that application catalog may deliver Product model, and application cart may deliver another Product, but related to "product in cart" entity. In that case there will be conflict with Doctype names.

Possibility to set override doctype name

To get more control over building an index I need to set own doctype name. For example I need to define multiple indexes for one model, which may differ in doctype and indexed documents, of course.

from .models import MyModel

class DefaultIndex(ModelIndex):
   class Meta:
       model = MyModel
       doctype = 'MyModel'
       fields = ['a','b','c']
       index_query = 

class SpecialIndex(ModelIndex):
   class Meta:
       model = MyModel
       doctype = `SpecialMyModel`
       fields = ['b','c','d']
       index_query = ...

BUNGIE['INDICES']

I do not understand why whole index is automatically mapped to model classes (and doctypes) provided in one module. This forces me to change structure od python modules because BUNGIE fetches whole module into one index.

Maybe 'INDICES' can be replaced with some sort of auto-registry, where index name must be provided in ModelIndex` Meta.

Query via ModelIndex not Manager

ModelIndex is related to index and doctype. There should be possibility to query Index instance (or class in worst case) directly. Maybe some shortcut method should work, same as custom_search() in BungieManager.

BungieManager is not quite useful. I can bind Manager to a model and query quite different index (via custom_search). This is a some sort of inconsistency.

Better way is to instantiate ModelIndex as a Model's class property, which is bound directly with model via Meta.model. Maybe automatic registering in a model class will be useful also.

class MyModel(models.Model):
    search = MyModelIndex()

And querying will be simpler:

MyModel.search.query(...)

Simple high-level query API

Currently querying is little compliated. It would be nice to provide high-level query API for common queries, i.e. query string with dismax search. Currently syntax looks like:

MyModel.index.custom_search('indexname', 'MyModel').query('query_string', query='cat AND dog', use_dis_max=True))

I would like to query Index (or index manager) in a simpler way, i.e. MyModel.index.query_string('cat AND dog')

This can be done outside bungiesearch and I will work on it for my needs, but it would be nice to include it in bungiesearch in future.


I can work on some improvements on a fork of bungiesearch if you're interested in. Thanks for reading and patience. ;)

ChristopherRabotin commented 9 years ago

Thanks for your comments, it's always a pleasure to read given it shows interest in the project. You make some valid points, others I disagree with, so here we go.

Custom doctypes and app level doctype segregation (app_name.model_name)

Yes, this is a good idea. It hasn't been a use case yet, and that's the only reason why it hasn't been implemented yet.

Bungie['INDICES'] and code structure

How does this impact your code structure? Bungiesearch was written for ease of integration, so if there are ways to improve this, definitely let me know. The point of this is only to force each ModelIndex into a namespace. For awhile I had three indices running: one index specification per file makes it trivial to make similar indices. In addition, separating these in multiple files greatly simplifies discovery of the ModelIndex subclasses on bungiesearch initialization.

Query via ModelIndex not Manager

I don't think that's a good idea. Custom managers are the Django way of adding functionality of models. This makes use of how Django loads the models, which avoids circular imports. That is also why you cannot reference the ModelIndex in the models themselves (leads to horrible circular imports).

Query ES without a Django model

The goal of the project is to integrate ES-dsl into Django. The DSL library supports manual querying out of the box if needed (cf https://github.com/elastic/elasticsearch-dsl-py#persistence-example). The use case of bungiesearch is when you have data stored in a database and you'd like to optimally search this information which also happens to be indexed on an ES cluster.

Simple high-level query API

That could be an interesting feature, but it would require a new domain specific language definition. Given how thorough that of elasticsearch is, I'm not convinced another DSL would add much value. However, if you feel like creating a bungiesearch DSL with a parser, I'll be more than happy to review your pull request.

Contributing

I'm absolutely open to external contributions, and (usually) try to review and merge them quickly. I prefer pull requests to forks since it helps in seeing how bungiesearch can have new desired functionality and make sure it stays in sync with what other devs want.

diwu1989 commented 9 years ago

"Custom doctypes and app level doctype segregation" and more explicit doc typing + ability to override doc type are important to my company too.

notfol commented 8 years ago

Bungie['INDICES'] -- For time series documents you may want to have indices include a date string (e.g. logstash-2016.02.28). It would be nice to have the ability to generate the index via a callback function.

ChristopherRabotin commented 8 years ago

@notfol by indices with a date string, do you mean indices that are generated on the fly?

notfol commented 8 years ago

@ChristopherRabotin Yep! Sometimes we want to index by date. Often the date comes from a DateTimeField on the model.

notfol commented 8 years ago

Also wondered if there were plans to allow meta fields like _parent. Looks like this has to be defined outside of "properties."

https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-parent-child.html