django-es / django-elasticsearch-dsl

This is a package that allows indexing of django models in elasticsearch with elasticsearch-dsl-py.
Other
1.02k stars 263 forks source link

Field 'id' expected a number but got '41jl7ngByXlYCcE8rXBp' with to_queryset() #341

Open Sachin-Kahandal opened 3 years ago

Sachin-Kahandal commented 3 years ago

Somethings up with id, though I haven't made any changes in to_queryset function

s = MyntraClothesDocument.search(index="myntra_clothes").query(MultiMatch(query=text, fields=['Title', 'Description'], operator="AND", fuzziness='AUTO')) qs = s.to_queryset()

ValueError: Field 'id' expected a number but got '41jl7ngByXlYCcE8rXBp'.

Thank you

KunstDerFuge commented 3 years ago

Same thing just happened to me -- I believe it was triggered by doing an incremental update (as opposed to rebuilding from scratch)... or at least it happened after doing one:

    posts_created = Post.objects.bulk_create(new_posts, ignore_conflicts=True)
    PostDocument().update(posts_created)

All IDs in the index now appear to be these weird hashes rather than ints, so can't be used to get a QuerySet.

Edit: It isn't all IDs, just some recent ones. Now looking for a way to selectively delete non-numeric IDs...

I think what may have happened is I tried to be too clever and simplified the code for updating the index from newly created objects. I was originally basing it off of the code from this post but removed the middle parts to avoid what looked like an unnecessary hit to the DB. I don't understand why, but it seems cutting out the intermediate step may have been the problem in my case. ES seems to have indexed some kind of temporary ID rather than the PK, and needs to get the real PK from the database.

safwanrahman commented 2 years ago

Interesting. Need to look at it. @saadmk11 Will you have some time to look?

saadmk11 commented 2 years ago

@KunstDerFuge Thanks for reporting the issue. Directly using the objects list returned by the bulk_create() can cause this issue.

From Django's (bulk_create()) Documentation:

If the model’s primary key is an AutoField, the primary key attribute can only be retrieved on certain databases (currently PostgreSQL and MariaDB 10.5+). On other databases, it will not be set.

Ref: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#bulk-create

Please let us know which Database and Django version you are uisng.

KunstDerFuge commented 2 years ago

Yes, I had heard this, and that's what's confusing to me. I'm using Postgres 12.8 and Django 3.2.3.

KunstDerFuge commented 2 years ago

I just ran a simple test, bulk creating three objects and checking what is returned from bulk_create(). It is a list of objects where both object.id and object.pk contain correct, auto-incremented integer keys. So perhaps in my case it wasn't due to bulk_create(). That's quite confusing...

Edit:

To be totally clear, the only time my ES index is updated is when one of two things happen: either the manage.py command (which is not at fault here) or the excerpt I shared above where the index is updated with PostDocument().update(posts_created). So it must be this line, I'm just not sure at what stage of the process something went wrong.

saadmk11 commented 2 years ago

After a little bit of research, I have found that if we use ignore_conflicts=True with bulk_create() it does not set the pk to the returned objects. That's why this is happening. If you don't use ignore_conflicts=True it should work as expected.

From Django Docs:

Enabling this parameter (ignore_conflicts) disables setting the primary key on each model instance (if the database normally supports it).

Ref: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#bulk-create

Django Ticket on this issue: https://code.djangoproject.com/ticket/30138