TryGhost / Ghost

Independent technology for modern publishing, memberships, subscriptions and newsletters.
https://ghost.org
MIT License
47.44k stars 10.35k forks source link

Search Posts API #306

Closed ErisDS closed 9 years ago

ErisDS commented 11 years ago

Our API for browse posts should accept a query option which will perform a full text search on the post model. This should use the title, content, and possibly meta_* fields of a post.

At the moment, posts are self contained, but in the near future we will be adding additional tables such as tags/categories which will also need to be searchable.

Search should return a paginated set of matching posts, using the existing settings for limit, offset etc.

ErisDS commented 11 years ago

Do you have any details as to what schema changes you need yet? If you do, perhaps push up some stuff to a fork so I can take a look? I am doing a bit of a schema audit is all.

ErisDS commented 11 years ago

@julesbravo Any further updates to this?

julesbravo commented 11 years ago

Hannah I've been slacking I'll try to wrap it up Tuesday

On Sep 8, 2013, at 9:04 AM, Hannah Wolfe notifications@github.com wrote:

@julesbravo Any further updates to this?

— Reply to this email directly or view it on GitHub.

ErisDS commented 11 years ago

No probs just been wondering whether to put this into the current schema migration or not. Think it might be worth waiting, purely because there are so many other changes ongoing.

julesbravo commented 11 years ago

Hannah,

I've got what I think should work done, but I'm having issues around Knex. I'm going to try to hop into IRC tomorrow to hopefully get some pointers. I've just been having the damnedest time with this.

On Mon, Sep 9, 2013 at 7:46 AM, Hannah Wolfe notifications@github.comwrote:

No probs just been wondering whether to put this into the current schema migration or not. Think it might be worth waiting.

— Reply to this email directly or view it on GitHubhttps://github.com/TryGhost/Ghost/issues/306#issuecomment-24081970 .

ErisDS commented 11 years ago

This work was started in https://github.com/TryGhost/Ghost/pull/489, but now needs to be picked up by a developer willing to give it some serious love - including considering how it might be written as a BookShelf or Knex plugin.

halfdan commented 11 years ago

@ErisDS Can you assign me to this one?

ErisDS commented 11 years ago

FYI: This is for 0.5, and there is a search branch to submit PRs to. I recommend looking at what was done so far https://github.com/julesbravo/Ghost/commit/d9944de9916b81c3c3c84419ecffbe2e3c65ecd1 (not merged).

The big questions I have are:

Swaagie commented 11 years ago

Caught a bit of the discussion on IRC, I think the search should be handled by something like https://github.com/olivernn/lunr.js, this would allow plugins to be written towards an API, basically solving part of the puzzle. Lunr.js provides a tf-idf algoritm that allows documents to be ranked as well. Not simply listing the posts that contain the word but also sorting on relevance.

As reference, the search for the nodejitsu handbook is done with lunr.js. It's just a matter of pushing text/titles in at startup and let lunr.js do its magix

ErisDS commented 11 years ago

A bit like Solr, but much smaller and not as bright http://lunrjs.com

:dancers:

Excellent thanks :+1: Think this is probably well worth having a bit of a play with?

halfdan commented 11 years ago

The issue with lunrjs seems to be that all items have to be indexed on app startup and be kept in-memory during the lifetime of the application. This results in an increased memory footprint and scalability issues at some point (imagine indexing 5000 posts at app startup).

Swaagie commented 11 years ago

Agree with @halfdan there, we discussed this a bit over IRC. Using in database fts would be best, but not every database is as capable. For smaller blogs lunr.js will just do fine, however the upfront indexing will indeed increase memory footprints, reducing scalability. Perhaps lunr.js could be provided as plugin as intermediate API for databases which do not support fts

ErisDS commented 9 years ago

This issue is pretty old and floundering. We're looking for someone to take the lead! See: https://ghost.org/contribute/search-lead/

seesharper commented 9 years ago

You might want to check out my new project that adds full text search to the Ghost platform. It is rather simple and only supports SQLite, but it works :) https://github.com/seesharper/GhostSearch

dwstevens commented 9 years ago

I think having a stand alone db for the search index would be best. This way you don't have to deal with fuzzy text searching inconsistencies (or nonexistence) of the various dbms's out there.

A possible option would be using a LevelDB backed index (file based key/value store) using the search-index module.

https://github.com/fergiemcdowall/search-index

Yes it adds another file to the mix that could grow quite large, but I believe it would have a lower memory footprint than Lunr.js.

ErisDS commented 9 years ago

@dwstevens Tell me if I'm wrong, but I really don't think levelDB is an option. As far as I am aware it'll add a new dependency that is far more complicated even than sqlite3 to get installed (and that has the wonderful node-pre-gyp feature), meaning it's not suitable for our user base.

dwstevens commented 9 years ago

@ErisDS Ah, my bad. You are correct.

ErisDS commented 9 years ago

Closing this issue in favour of #5321 which has plenty of discussion & traction. Having 2 issues is just confusing at this point.