Closed ErisDS closed 9 years ago
Do you have any details as to what schema changes you need yet? If you do, perhaps push up some stuff to a fork so I can take a look? I am doing a bit of a schema audit is all.
@julesbravo Any further updates to this?
Hannah I've been slacking I'll try to wrap it up Tuesday
On Sep 8, 2013, at 9:04 AM, Hannah Wolfe notifications@github.com wrote:
@julesbravo Any further updates to this?
— Reply to this email directly or view it on GitHub.
No probs just been wondering whether to put this into the current schema migration or not. Think it might be worth waiting, purely because there are so many other changes ongoing.
Hannah,
I've got what I think should work done, but I'm having issues around Knex. I'm going to try to hop into IRC tomorrow to hopefully get some pointers. I've just been having the damnedest time with this.
On Mon, Sep 9, 2013 at 7:46 AM, Hannah Wolfe notifications@github.comwrote:
No probs just been wondering whether to put this into the current schema migration or not. Think it might be worth waiting.
— Reply to this email directly or view it on GitHubhttps://github.com/TryGhost/Ghost/issues/306#issuecomment-24081970 .
This work was started in https://github.com/TryGhost/Ghost/pull/489, but now needs to be picked up by a developer willing to give it some serious love - including considering how it might be written as a BookShelf or Knex plugin.
@ErisDS Can you assign me to this one?
FYI: This is for 0.5, and there is a search
branch to submit PRs to. I recommend looking at what was done so far https://github.com/julesbravo/Ghost/commit/d9944de9916b81c3c3c84419ecffbe2e3c65ecd1 (not merged).
The big questions I have are:
Caught a bit of the discussion on IRC, I think the search should be handled by something like https://github.com/olivernn/lunr.js, this would allow plugins to be written towards an API, basically solving part of the puzzle. Lunr.js provides a tf-idf algoritm that allows documents to be ranked as well. Not simply listing the posts that contain the word but also sorting on relevance.
As reference, the search for the nodejitsu handbook is done with lunr.js. It's just a matter of pushing text/titles in at startup and let lunr.js do its magix
A bit like Solr, but much smaller and not as bright http://lunrjs.com
:dancers:
Excellent thanks :+1: Think this is probably well worth having a bit of a play with?
The issue with lunrjs seems to be that all items have to be indexed on app startup and be kept in-memory during the lifetime of the application. This results in an increased memory footprint and scalability issues at some point (imagine indexing 5000 posts at app startup).
Agree with @halfdan there, we discussed this a bit over IRC. Using in database fts would be best, but not every database is as capable. For smaller blogs lunr.js will just do fine, however the upfront indexing will indeed increase memory footprints, reducing scalability. Perhaps lunr.js could be provided as plugin as intermediate API for databases which do not support fts
This issue is pretty old and floundering. We're looking for someone to take the lead! See: https://ghost.org/contribute/search-lead/
You might want to check out my new project that adds full text search to the Ghost platform. It is rather simple and only supports SQLite, but it works :) https://github.com/seesharper/GhostSearch
I think having a stand alone db for the search index would be best. This way you don't have to deal with fuzzy text searching inconsistencies (or nonexistence) of the various dbms's out there.
A possible option would be using a LevelDB backed index (file based key/value store) using the search-index module.
https://github.com/fergiemcdowall/search-index
Yes it adds another file to the mix that could grow quite large, but I believe it would have a lower memory footprint than Lunr.js.
@dwstevens Tell me if I'm wrong, but I really don't think levelDB is an option. As far as I am aware it'll add a new dependency that is far more complicated even than sqlite3 to get installed (and that has the wonderful node-pre-gyp feature), meaning it's not suitable for our user base.
@ErisDS Ah, my bad. You are correct.
Closing this issue in favour of #5321 which has plenty of discussion & traction. Having 2 issues is just confusing at this point.
Our API for browse posts should accept a query option which will perform a full text search on the post model. This should use the title, content, and possibly meta_* fields of a post.
At the moment, posts are self contained, but in the near future we will be adding additional tables such as tags/categories which will also need to be searchable.
Search should return a paginated set of matching posts, using the existing settings for limit, offset etc.