WebberZone / contextual-related-posts

Contextual Related Posts WordPress plugin
https://webberzone.com/plugins/contextual-related-posts/
39 stars 22 forks source link

Don't duplicate indexes crp_related is enought #73

Closed PutinVladimir closed 2 years ago

PutinVladimir commented 5 years ago

FULLTEXT indexes is duplicate size of wp_post. If you have 100000 posts is typical size of wp_posts is 200-300mb. FULLTEXT index on post_content will be 200-300mb. You create 3 indexes: crp_related (post_title, post_content); crp_related_title (post_title); crp_related_content (post_content);

and you have only one option in settings "Find related posts based on content as well as title" so i can guess you always select similar post by title or by title+content and never content only of posts. So for good working enough only one index crp_related. Index crp_related_content never used but is very heavy.

Also not needed heavy index is poor for db caching kyes it will clear it every time if size of key pool less then size of all indexes, but it should be good if it will be bigger then size of only one index crp_related and not exist other index.

I dropped crp_related_title and crp_related_content and nothing changed. So check query plan and delete unneeded idx.

ajaydsouza commented 5 years ago

Thanks for this. Let me run a few texts.

One thing I need to see is if this would work if I can search for title only and body only from crp_related so that the title and content can be weighted. But, it appears that this might be the case.

PutinVladimir commented 5 years ago

You have not option in settings for "body only"...

ajaydsouza commented 5 years ago

Hi, I've been running some checks. Although in the current status the extra indices are not needed, I'm planning on implementing a change in the algorithm to use weighting system for the title and the content. This is similar to what I'm done with my other plugin Better Search. In this case the individual indices are needed else the query fails.

PutinVladimir commented 5 years ago

You are wrong again.

  1. Search in title complex index title+content is ok.
  2. Search in title+content complex index title+content is ok.
  3. Search in content only complex index title+content is fail. I quikly look at src of better search and there also no options for search in content only (may be I mistaken not wasted a lot of time on src). Then if you really want search in content only (what is not give any new advantages) should create two indexes on title field and on content field. And do not create complex title+content. I guess test will show same results (maybe query will need some modifications) but you will not duplicate heavy index and in complex when index is more than 1gb and ram is limited it give advantages.
PutinVladimir commented 5 years ago

And one more limit heavy index lenght is a key which can reduce size with acceptable results...

ajaydsouza commented 5 years ago

Better Search generates a query as below. This uses a weighting with a separate MATCH for the post_title and the post_content. Running this query without the content and title indices i.e. keeping only the (post_title,post_content) index created SQL queries of missing FULLTEXT indices. I haven't tested deleting the combined index. I will give that a shot as well to see what I can do.

SELECT SQL_CALC_FOUND_ROWS wp_posts.*, ID, 
  (MATCH(wp_posts.post_title) AGAINST ('sherlock' ) * 10 ) + (MATCH(wp_posts.post_content) AGAINST ('sherlock' ) * 1 ) AS score
FROM wp_posts
WHERE 1=1
AND MATCH (wp_posts.post_title,wp_posts.post_content) AGAINST ('sherlock' )
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'inherit')
AND wp_posts.post_type IN ('post', 'page', 'attachment')
AND wp_posts.post_password = ''
ORDER BY score DESC
LIMIT 0, 10
ajaydsouza commented 5 years ago

And one more limit heavy index lenght is a key which can reduce size with acceptable results...

Can you elaborate more on this? Is this an explicit setting I need to do?

PutinVladimir commented 5 years ago

In context of query maybe you can do a trick when you need content only search but use complex index title+content add to query title too but multiplay score of title match to same value to became it unimpotent and not affect the result of content...

About limit if you have a lot of big posts which is 5000-20000 chars length if limit index on content as 1000 chars (in other plugins this is usualy an option in setting where you can choose full or limit size index) you will get 100mb index against original 1000mb and and same quality for this plugin (not for search plugin where whole text matter).

ajaydsouza commented 3 years ago

In v3.0.0, I've dropped the need for the post_content database. https://github.com/WebberZone/contextual-related-posts/commit/bd92596263781ace255d810476fe2cabf6cdea27

The other two indices are used depending on whether a user enables the setting to search in content.