ChristopherRabotin / bungiesearch

UNMAINTAINED CODE -- Elasticsearch-dsl-py django wrapper with mapping generator
BSD 3-Clause "New" or "Revised" License
67 stars 20 forks source link

BulkIndexError when `search_index --update` #146

Open Jiydam opened 8 years ago

Jiydam commented 8 years ago

I am unable to index my models.

My model:

class Program(models.Model):
    objects = ProgramManager()

    name = models.CharField(max_length=100, unique=True)
    description = models.TextField(null = True, blank = True)
    type = models.ForeignKey('ProgramType')
    school = models.ForeignKey('School')
    department = models.ForeignKey('Department', blank = True, null = True)
    campuses = models.ManyToManyField('Campus', blank = True, null = True)
    num_courses = models.IntegerField(blank = True, null = True)
    num_units = models.IntegerField(blank = True, null = True)
    staff = models.ManyToManyField('user_manager.Member', blank = True, null = True)

My index


from catalog.models import Program
from bungiesearch.indices import ModelIndex

class ProgramIndex(ModelIndex):
    class Meta:
        model = Program
        exclude = {'campuses', 'num_courses', 'num_units', 'staff', 'department', 'school', 'type'}
        hotfixes = {
                    'name': {'boost': 1.75},
                    'description': {'boost': 1.35}}

When I run ./manage.py search_index --update

INFO:root:Updating models ['Program'] on indices ['main_index'].
INFO:root:Getting index for model Program.
WARNING:root:No updated date field found for Program - not restricting with start and end date
INFO:root:index 19 documents on index main_index
INFO:root:Index: documents 0 to 100 of 19 total on index main_index.
INFO:urllib3.connectionpool:Starting new HTTP connection (1): localhost
INFO:elasticsearch:POST http://localhost:9200/main_index/Program/_bulk [status:200 request:0.128s]
No handlers could be found for logger "elasticsearch.trace"
Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/Library/Python/2.7/site-packages/django/core/management/__init__.py", line 338, in execute_from_command_line
    utility.execute()
  File "/Library/Python/2.7/site-packages/django/core/management/__init__.py", line 330, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Library/Python/2.7/site-packages/django/core/management/base.py", line 390, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Library/Python/2.7/site-packages/django/core/management/base.py", line 441, in execute
    output = self.handle(*args, **options)
  File "/Users/administrator/Documents/Metis/app/bungiesearch/management/commands/search_index.py", line 196, in handle
    update_index(src.get_model_index(model_name).get_model().objects.all(), model_name, bulk_size=options['bulk_size'], num_docs=options['num_docs'], start_date=options['start_date'], end_date=options['end_date'])
  File "/Users/administrator/Documents/Metis/app/bungiesearch/utils.py", line 62, in update_index
    bulk_index(src.get_es_instance(), data, index=index_name, doc_type=model.__name__, raise_on_error=True)
  File "/Library/Python/2.7/site-packages/elasticsearch/helpers/__init__.py", line 188, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/Library/Python/2.7/site-packages/elasticsearch/helpers/__init__.py", line 160, in streaming_bulk
    for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, **kwargs):
  File "/Library/Python/2.7/site-packages/elasticsearch/helpers/__init__.py", line 132, in _process_bulk_chunk
    raise BulkIndexError('%i document(s) failed to index.' % len(errors), errors)
elasticsearch.helpers.BulkIndexError: (u'19 document(s) failed to index.', [{u'index': {u'status': 500, u'_type': u'Program', u'_id': u'1', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'5', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'6', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'7', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'8', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'9', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'10', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'11', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'12', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'13', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'14', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'15', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'16', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'17', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'18', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'19', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'20', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'21', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'22', u'error': {u'reason': u'java.lang.String cannot be cast to java.lang.Number', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}])

My mapping

{
  "main_index": {
    "aliases": {},
    "mappings": {
      "Program": {
        "properties": {
          "_id": {
            "type": "integer"
          },
          "description": {
            "type": "string",
            "boost": 1.35,
            "analyzer": "snowball"
          },
          "id": {
            "type": "integer"
          },
          "name": {
            "type": "string",
            "boost": 1.75,
            "analyzer": "snowball"
          }
        }
      }
    },
    "settings": {
      "index": {
        "creation_date": "1453791581592",
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "6UmM8YJBQI6m7H1XKnEm6Q",
        "version": {
          "created": "2010099"
        }
      }
    },
    "warmers": {}
  }
}
ChristopherRabotin commented 8 years ago

Did that index already exist before you tried using bungie search on it?

On Tue, Jan 26, 2016, 07:08 Jiyda Mint Moussa notifications@github.com wrote:

I am unable to index my models

My model:

class Program(modelsModel): objects = ProgramManager()

name = modelsCharField(max_length=100, unique=True)
description = modelsTextField(null = True, blank = True)
type = modelsForeignKey('ProgramType')
school = modelsForeignKey('School')
department = modelsForeignKey('Department', blank = True, null = True)
campuses = modelsManyToManyField('Campus', blank = True, null = True)
num_courses = modelsIntegerField(blank = True, null = True)
num_units = modelsIntegerField(blank = True, null = True)
staff = modelsManyToManyField('user_managerMember', blank = True, null = True)

My index

from catalogmodels import Program from bungiesearchindices import ModelIndex

class ProgramIndex(ModelIndex): class Meta: model = Program exclude = {'campuses', 'num_courses', 'num_units', 'staff', 'department', 'school', 'type'} hotfixes = { 'name': {'boost': 175}, 'description': {'boost': 135}}

When I run /managepy search_index --update

INFO:root:Updating models ['Program'] on indices ['main_index'] INFO:root:Getting index for model Program WARNING:root:No updated date field found for Program - not restricting with start and end date INFO:root:index 19 documents on index main_index INFO:root:Index: documents 0 to 100 of 19 total on index main_index INFO:urllib3connectionpool:Starting new HTTP connection (1): localhost INFO:elasticsearch:POST http://localhost:9200/main_index/Program/_bulk [status:200 request:0128s] No handlers could be found for logger "elasticsearchtrace" Traceback (most recent call last): File "managepy", line 10, in execute_from_command_line(sysargv) File "/Library/Python/27/site-packages/django/core/management/initpy", line 338, in execute_from_command_line utilityexecute() File "/Library/Python/27/site-packages/django/core/management/initpy", line 330, in execute selffetch_command(subcommand)run_from_argv(selfargv) File "/Library/Python/27/site-packages/django/core/management/basepy", line 390, in run_from_argv selfexecute(_args, _cmd_options) File "/Library/Python/27/site-packages/django/core/management/basepy", line 441, in execute output = selfhandle(_args, _options) File "/Users/administrator/Documents/Metis/app/bungiesearch/management/commands/search_indexpy", line 196, in handle update_index(srcget_model_index(model_name)get_model()objectsall(), model_name, bulk_size=options['bulk_size'], num_docs=options['num_docs'], start_date=options['start_date'], end_date=options['end_date']) File "/Users/administrator/Documents/Metis/app/bungiesearch/utilspy", line 62, in update_index bulk_index(srcget_es_instance(), data, index=index_name, doc_type=modelname, raise_on_error=True) File "/Library/Python/27/site-packages/elasticsearch/helpers/initpy", line 188, in bulk for ok, item in streaming_bulk(client, actions, _kwargs): File "/Library/Python/27/site-packages/elasticsearch/helpers/initpy", line 160, in streaming_bulk for result in _process_bulk_chunk(client, bulk_actions, raise_on_exception, raise_on_error, _kwargs): File "/Library/Python/27/site-packages/elasticsearch/helpers/initpy", line 132, in _process_bulk_chunk raise BulkIndexError('%i document(s) failed to index' % len(errors), errors) elasticsearchhelpersBulkIndexError: (u'19 document(s) failed to index', [{u'index': {u'status': 500, u'_type': u'Program', u'_id': u'1', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'5', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'6', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'7', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'8', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'9', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'10', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'11', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'12', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'13', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'14', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'15', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'16', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'17', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'18', u'error': {u'reason': u'javalang String cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'19', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'20', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'21', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}, {u'index': {u'status': 500, u'_type': u'Program', u'_id': u'22', u'error': {u'reason': u'javalangString cannot be cast to javalangNumber', u'type': u'class_cast_exception'}, u'_index': u'main_index'}}])

My mapping

{ "main_index": { "aliases": {}, "mappings": { "Program": { "properties": { "_id": { "type": "integer" }, "description": { "type": "string", "boost": 135, "analyzer": "snowball" }, "id": { "type": "integer" }, "name": { "type": "string", "boost": 175, "analyzer": "snowball" } } } }, "settings": { "index": { "creation_date": "1453791581592", "number_of_shards": "5", "number_of_replicas": "1", "uuid": "6UmM8YJBQI6m7H1XKnEm6Q", "version": { "created": "2010099" } } }, "warmers": {} } }

— Reply to this email directly or view it on GitHub https://github.com/ChristopherRabotin/bungiesearch/issues/146.

Jiydam commented 8 years ago

I was using django-haystack before but I cleared that index. The only index I have now is main_index which was created by bungie search.

ChristopherRabotin commented 8 years ago

Okay. Odd. And the id on your model is definitely an integer with no null, or any non integer value? Bungie search is able to detect the appropriate field type to generate the mapping, and that part of the code hasn't changed in over a year... Does the Program Manager mix in with the Bungie search manager?

On Tue, Jan 26, 2016, 07:51 Jiyda Mint Moussa notifications@github.com wrote:

I was using django-haystack before but I cleared that index. The only index I have now is main_index which was created by bungie search.

— Reply to this email directly or view it on GitHub https://github.com/ChristopherRabotin/bungiesearch/issues/146#issuecomment-174879454 .

Jiydam commented 8 years ago

my ProgramManager is existing code I had before, doesn't really do anything related to search. Any pointers on how I would debug the issue?

ChristopherRabotin commented 8 years ago

If I recall correctly (I haven't changed how bungie search is used in production for months), adding the manager will allow you to search the model by invoking aliases or the search attribute. However, I don't think it actually adds anything to the mapping.

To debug, I'd have a look at the Program Manager and see how the ID field is definitely there (or more so how it's defined in the parent model mix in).

On Tue, Jan 26, 2016, 08:14 Jiyda Mint Moussa notifications@github.com wrote:

my ProgramManager is existing code I had before, doesn't really do anything related to search. Any pointers on how I would debug the issue?

— Reply to this email directly or view it on GitHub https://github.com/ChristopherRabotin/bungiesearch/issues/146#issuecomment-174885247 .

Jiydam commented 8 years ago

I just changed the mapping of _id to string instead of integer and it worked, is that going to break other things?

ChristopherRabotin commented 8 years ago

No, it should not break anything if the field is indeed an integer and never has a string value.

I'll take the code you posted to attempt to create a test case and see whether your issue is reproducible. Is there anything in the Program Manager code that you can disclose and which impacts the fields of the table?

On Tue, Jan 26, 2016, 12:01 Jiyda Mint Moussa notifications@github.com wrote:

I just changed the mapping of _id to string instead of integer and it worked, is that going to break other things?

— Reply to this email directly or view it on GitHub https://github.com/ChristopherRabotin/bungiesearch/issues/146#issuecomment-174957452 .

Jiydam commented 8 years ago

I removed the ProgramManager and still was getting the error. It seems that the ES bulk method was expecting the mapping of _id to be string not an integer for some reason, since I tried the following in python console and it fails with _id as an integer

bulk_index(es_instance, data, index=index_name, doc_type=doc_type, raise_on_error=True)

You can check the bulk api, the _id is also provided as a string.

I really appreciate your support. I am using it now and everything seems to work fine so far.