getgrav / grav-plugin-simplesearch

Grav SimpleSearch Plugin
https://getgrav.org
MIT License
44 stars 55 forks source link

Problem with search in content title and search performance #77

Open ghost opened 8 years ago

ghost commented 8 years ago

Hi. I've implemented simple search into my online magazine. In the config I've set the hierarchy to look for the "content title" - sadly, it doesn't seem to accept this properly as the first results don't match my search term or the title of the article. You can see this by checking out: https://mnmz.de/mdrnhf and by using the search function in the upper right.

Also I would like to know why you think the search takes so long. There are approx. 650 articles in the system - is this the problem and are there any ways to speed this up?

Thanks in advance

rhukster commented 8 years ago

The simplesearch plugin is pretty 'simply' in that it searches for keywords in the title and content of the body each time. With 650 articles the initial search is going to be slow as it process and then caches each article. After that it should definitely be faster as it only has to search the content by looking in the cache.

Still for bigger sites this is not something that is feasible, and the bigger the content, the more searching that has to be done. A better solution would be an Indexing search engine. One that indexes content offline and you search against the index. This is something I would love to work on, but have not had time.

An alternative would be integration with a 3rd party external search service like https://www.algolia.com. It will definitely require a plugin to be able to index the content, and also query the search engine and display results.

rhukster commented 8 years ago

Some others for reference:

http://www.searchly.com/ https://www.searchify.com/ https://aws.amazon.com/cloudsearch/

ghost commented 8 years ago

Thank you very much. I will check out the aws solution. Looks promising to me ;)

lorddoumer commented 7 years ago

Hi, I have a simiplar problem, that the search-progress is taking too long and finally gets aborted without any results. I've submitted an issue last year (https://github.com/getgrav/grav/issues/941) which could be solved by reducing the image quality a lot but now the site has a lot of articles and further reducing the image quality is not an option since those are the essential part. Maybe it would be an good idea to add an option to only search through page titles and ignor the content? This would certeainly help on larger pages and definitivly solve my problem.

Apart from this: Grav is an awesome CMS, thank you :-)

rhukster commented 7 years ago

As I said originally, SimpleSearch was only ever intended as a simple search. It does a simple text search on all content. So lots of content, or slow-rendering content (due to images being processed) is going to be a problem.

You really need to look at integrating a proper index-based search engine for these kinds of use cases. It's just beyond what this plugin was intended to do.

rhukster commented 7 years ago

BTW, I just released a new version (1.12.0) that has a new option called search_content, by default this uses rendered value which is the same as before.

However, you can set this to raw and it will search via Page::rawMarkdown() which is considerably faster than Page::content(). Give it a try when it shows up in GPM.

lorddoumer commented 7 years ago

thank you, gonna try it! like I said I really like grav and would love to have a lean, depency-free system. if it works it would be great, if not I just remove the search, no problem.

EDIT: the new version with raw content is working great, thank you so much!