digitalmethodsinitiative / dmi-tcat

Digital Methods Initiative - Twitter Capture and Analysis Toolset
Apache License 2.0
367 stars 114 forks source link

Improve analysis performance by using the existing FULLTEXT indexes #285

Closed dentoir closed 5 years ago

dentoir commented 7 years ago

We should consider how we can use MATCH() AGAINST() syntax in the analysis frontend, instead of LIKE syntax. Particularly the sqlSubset() function in analysis/common/functions.php should be studied.

ErikBorra commented 7 years ago

i.e., what are the benefits of using match against over like in terms of performance, query formulation, etcetera. Community input is welcomed.

dentoir commented 5 years ago

FULLTEXT using the syntax I suggested runs in to unexpected problems with stoplists or short words (no string 'D66' found etc.) - and FULLTEXT is not supported by all modern storage engines, therefore I don't think this is the way to go for now.