etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.
BSD 3-Clause "New" or "Revised" License
1.2k stars 129 forks source link

rebuilding index doesn't remove entries for deleted objects #276

Closed valentijnscholten closed 2 years ago

valentijnscholten commented 3 years ago

Hi,

  1. have some model indexed, i.e. Finding model
  2. insert some model instances into the db
  3. observe django-watson indexes correctly
  4. delete Finding with id 1234 from the database
  5. run buildwatson
  6. observer the searchentry table still contains entries conneted to Finding 1234 that no longer exists in the database.

My expectation would be that if objets are deleted, they will no longer be indexed by buildwatson. But it looks like these searchentries are just left behind?

(dojo) dojo@defectdojo:~/DefectDojo/dojo/django-DefectDojo$ ./manage.py buildwatson
[01/Sep/2020 11:29:03] INFO [dojo.models:3133] enabling audit logging
[01/Sep/2020 11:29:03] DEBUG [dojo.tag.prefetching_tag_descriptor:18] patching TagDescriptor
Deleted 0 stale search entry(s) in 'admin' search engine.
Deleted 0 stale search entry(s) in 'default' search engine.
Refreshed 9844 search entry(s) in 'default' search engine.

mysql root@localhost:DojoDB> select id from watson_searchentry where content_type_id=37 and object_id not i
                          -> n (select id from dojo_finding)
+-------+
|    id |
|-------|
| 83469 |
| 83412 |
| 83433 |
+-------+
3 rows in set

Am I missing something or is my expectation correct?

etianen commented 3 years ago

Are you sure that the model instances you've deleted are actually deleted? Some model plugins simply mark them with an id_deleted flag or similar. Please check your DB with a manual SQL query to see if the offending instances still exist.

If they do, then this is a bug in django-watson. If you can get to the bottom of it, I'll take a MR. I'm surprised though. This is basic functionality that has worked for years.

On Tue, 1 Sep 2020 at 10:55, valentijnscholten notifications@github.com wrote:

Hi,

  1. have some model indexed, i.e. Finding model
  2. insert some model instances into the db
  3. observe django-watson indexes correctly
  4. delete Finding with id 1234 from the database
  5. run buildwatson
  6. observer the searchentry table still contains entries conneted to Finding 1234 that no longer exists in the database.

My expectation would be that if objets are deleted, they will no longer be indexed by buildwatson. But it looks like these searchentries are just left behind?

(dojo) dojo@defectdojo:~/DefectDojo/dojo/django-DefectDojo$ ./manage.py buildwatson [01/Sep/2020 11:29:03] INFO [dojo.models:3133] enabling audit logging [01/Sep/2020 11:29:03] DEBUG [dojo.tag.prefetching_tag_descriptor:18] patching TagDescriptor Deleted 0 stale search entry(s) in 'admin' search engine. Deleted 0 stale search entry(s) in 'default' search engine. Refreshed 9844 search entry(s) in 'default' search engine.

mysql root@localhost:DojoDB> select id from watson_searchentry where content_type_id=37 and object_id not i -> n (select id from dojo_finding) +-------+ id
83469
83412
83433

+-------+ 3 rows in set

Am I missing something or is my expectation correct?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/issues/276, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEKCAFMBRQSRT2XZCYTT3SDTAI7ANCNFSM4QRQT5TA .

valentijnscholten commented 3 years ago

The query above shows that there are 2 search entries that refer to database rows that do not fysically exist in the database.

etianen commented 3 years ago

Oh, goodness, you're right. The buildwatson command doesn't actually delete entries corresponding to deleted instances. I wonder why that really obvious functionality was overlooked. :S

I'd take a MR to fix it. I'm afraid I don't have the time to do it myself right now.

On Wed, 2 Sep 2020 at 08:48, valentijnscholten notifications@github.com wrote:

The query above shows that there are 2 search entries that refer to database rows that do not fysically exist in the database.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/etianen/django-watson/issues/276#issuecomment-685419396, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEKCGWZ7DVGADSUSIQ43LSDX2DBANCNFSM4QRQT5TA .