etianen / django-watson

Full-text multi-table search application for Django. Easy to install and use, with good performance.
BSD 3-Clause "New" or "Revised" License
1.2k stars 129 forks source link

MySQL InnoDB support #120

Closed kilbasar closed 7 years ago

kilbasar commented 9 years ago

As of MySQL 5.6, InnoDB supports full text search via FULLTEXT indexes, similar to MyISAM. Would it be possible for django-watson to support full text searching on InnoDB databases? Thank you!

https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html https://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html

etianen commented 9 years ago

Does this require any changes to how fulltext queries are performed?

From what I can see, supporting this will simply require not setting the storage engine to MyISAM if MySQL version >= 5.6.

https://github.com/etianen/django-watson/blob/master/src/watson/backends.py#L357

Or are there any subtleties I have missed?

On Fri, 14 Aug 2015 at 21:34 kilbasar notifications@github.com wrote:

As of MySQL 5.6, InnoDB supports full text search via FULLTEXT indexes, similar to MyISAM. Would it be possible for django-watson to support full text searching on InnoDB databases? Thank you!

https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html https://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html

— Reply to this email directly or view it on GitHub https://github.com/etianen/django-watson/issues/120.

kilbasar commented 9 years ago

I'm pretty new to MySQL's full text searching, but I agree, based on the documentation it seems like InnoDB FULLTEXT indexes are used exactly like the MyISAM version. I can confirm that if you just change MyISAM to InnoDB in that ALTER TABLE statement, watson installs and seems to function when running with MySQL 5.6. Went through the whole syncdb/installwatson/buildwatson procedure, and did some test watson.search()'s.

The only thing I noticed was this, when doing the initial migration:

Applying watson.0001_initial.../opt/Python/lib/python2.7/site-packages/django/db/backends/mysql/base.py:124: Warning: InnoDB rebuilding table to add column FTS_DOC_ID return self.cursor.execute(query, args)

Everything else went the same as it does with MyISAM.

mysql> show table status where Name = 'watson_searchentry'\G
*************************** 1. row ***************************
           Name: watson_searchentry
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 21342
 Avg_row_length: 320
    Data_length: 6832128
Max_data_length: 0
   Index_length: 1638400
      Data_free: 12582912
 Auto_increment: 22011
    Create_time: 2015-08-17 11:12:08
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.01 sec)

mysql> describe watson_searchentry;
+-----------------+---------------+------+-----+---------+----------------+
| Field           | Type          | Null | Key | Default | Extra          |
+-----------------+---------------+------+-----+---------+----------------+
| id              | int(11)       | NO   | PRI | NULL    | auto_increment |
| engine_slug     | varchar(200)  | NO   | MUL | NULL    |                |
| object_id       | longtext      | NO   |     | NULL    |                |
| object_id_int   | int(11)       | YES  | MUL | NULL    |                |
| title           | varchar(1000) | NO   | MUL | NULL    |                |
| description     | longtext      | NO   | MUL | NULL    |                |
| content         | longtext      | NO   | MUL | NULL    |                |
| url             | varchar(1000) | NO   |     | NULL    |                |
| meta_encoded    | longtext      | NO   |     | NULL    |                |
| content_type_id | int(11)       | NO   | MUL | NULL    |                |
+-----------------+---------------+------+-----+---------+----------------+
10 rows in set (0.00 sec)

mysql> show index from watson_searchentry where Key_name = 'watson_searchentry_fulltext';
+--------------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table              | Non_unique | Key_name                    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| watson_searchentry |          1 | watson_searchentry_fulltext |            1 | title       | NULL      |       21339 |     NULL | NULL   |      | FULLTEXT   |         |               |
| watson_searchentry |          1 | watson_searchentry_fulltext |            2 | description | NULL      |       21339 |     NULL | NULL   |      | FULLTEXT   |         |               |
| watson_searchentry |          1 | watson_searchentry_fulltext |            3 | content     | NULL      |       21339 |     NULL | NULL   |      | FULLTEXT   |         |               |
+--------------------+------------+-----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.01 sec)
etianen commented 9 years ago

I've implemented this in a branch, here:

https://github.com/etianen/django-watson/tree/mysql-innodb-fulltext

Unfortunately, when running on MySQL 5.6.25, almost all the tests for search matching fail.

testAdminIntegration (test_watson.tests.AdminIntegrationTest) ... FAIL
testFieldsExcludedOnFilter (test_watson.tests.ComplexRegistrationTest) ... FAIL
testFieldsExcludedOnSearch (test_watson.tests.ComplexRegistrationTest) ... FAIL
testMetaNotStored (test_watson.tests.ComplexRegistrationTest) ... ERROR
testMetaStored (test_watson.tests.ComplexRegistrationTest) ... ERROR
testUninstallAndInstall (test_watson.tests.InstallUninstallTestBase) ... ok
testBuildWatsonCommand (test_watson.tests.InternalsTest) ... FAIL
testBuildWatsonForModelCommand (test_watson.tests.InternalsTest) ... /Users/dave/Workspace/watson/src/watson/management/commands/buildwatson.py:86: RemovedInDjango19Warning: django.db.models.get_model is deprecated.
  model = get_model(*model_name.split("."))  # app label, model name
/Users/dave/Workspace/watson/venv/lib/python2.7/site-packages/django/db/models/__init__.py:55: RemovedInDjango19Warning: The utilities in django.db.models.loading are deprecated in favor of the new application loading system.
  from . import loading
FAIL
testEmptyFilterGivesAllResults (test_watson.tests.InternalsTest) ... ok
testFilter (test_watson.tests.InternalsTest) ... FAIL
testFixesDuplicateSearchEntries (test_watson.tests.InternalsTest) ... ok
testNestedSkipInUpdateContext (test_watson.tests.InternalsTest) ... FAIL
testNestedUpdateInSkipContext (test_watson.tests.InternalsTest) ... FAIL
testPrefixFilter (test_watson.tests.InternalsTest) ... FAIL
testSearchEntriesCreated (test_watson.tests.InternalsTest) ... ok
testSearchIndexUpdateAbandonedOnError (test_watson.tests.InternalsTest) ... ok
testSearchIndexUpdateDeferredByContext (test_watson.tests.InternalsTest) ... FAIL
testSkipSearchIndexUpdate (test_watson.tests.InternalsTest) ... ok
testUpdateSearchIndex (test_watson.tests.InternalsTest) ... FAIL
testCanOverridePublication (test_watson.tests.LiveFilterSearchTest) ... FAIL
testExcludedModelList (test_watson.tests.LiveFilterSearchTest) ... FAIL
testExcludedModelQuerySet (test_watson.tests.LiveFilterSearchTest) ... FAIL
testKitchenSink (test_watson.tests.LiveFilterSearchTest)
For sanity, let's just test everything together in one giant search of doom! ... ERROR
testLimitedModelList (test_watson.tests.LiveFilterSearchTest) ... FAIL
testLimitedModelQuerySet (test_watson.tests.LiveFilterSearchTest) ... FAIL
testMultiTablePrefixSearch (test_watson.tests.LiveFilterSearchTest) ... FAIL
testMultiTableSearch (test_watson.tests.LiveFilterSearchTest) ... FAIL
testSearchWithAccent (test_watson.tests.LiveFilterSearchTest) ... FAIL
testSearchWithApostrophe (test_watson.tests.LiveFilterSearchTest) ... FAIL
testSearchWithLeadingApostrophe (test_watson.tests.LiveFilterSearchTest) ... FAIL
testUnpublishedModelsNotFound (test_watson.tests.LiveFilterSearchTest) ... FAIL
testRankingParamAbsentOnFilter (test_watson.tests.RankingTest) ... ERROR
testRankingParamAbsentOnSearch (test_watson.tests.RankingTest) ... ERROR
testRankingParamPresentOnFilter (test_watson.tests.RankingTest) ... ERROR
testRankingParamPresentOnSearch (test_watson.tests.RankingTest) ... ERROR
testRankingWithFilter (test_watson.tests.RankingTest) ... FAIL
testRankingWithSearch (test_watson.tests.RankingTest) ... FAIL
testRegistration (test_watson.tests.RegistrationTest) ... ok
testExcludedModelList (test_watson.tests.SearchTest) ... FAIL
testExcludedModelQuerySet (test_watson.tests.SearchTest) ... FAIL
testKitchenSink (test_watson.tests.SearchTest)
For sanity, let's just test everything together in one giant search of doom! ... ERROR
testLimitedModelList (test_watson.tests.SearchTest) ... FAIL
testLimitedModelQuerySet (test_watson.tests.SearchTest) ... FAIL
testMultiTablePrefixSearch (test_watson.tests.SearchTest) ... FAIL
testMultiTableSearch (test_watson.tests.SearchTest) ... FAIL
testSearchWithAccent (test_watson.tests.SearchTest) ... FAIL
testSearchWithApostrophe (test_watson.tests.SearchTest) ... FAIL
testSearchWithLeadingApostrophe (test_watson.tests.SearchTest) ... FAIL
testSiteSearch (test_watson.tests.SiteSearchTest) ... FAIL
testSiteSearchCustom (test_watson.tests.SiteSearchTest) ... FAIL
testSiteSearchCustomJSON (test_watson.tests.SiteSearchTest) ... FAIL
testSiteSearchJSON (test_watson.tests.SiteSearchTest) ... FAIL

It seems that there are some differences in how queries are handled in the new fulltext indices.

https://www.percona.com/blog/2013/03/04/innodb-full-text-search-in-mysql-5-6-part-2-the-queries/