floere / picky

Picky is an easy to use and fast Ruby semantic search engine that helps your users find what they are looking for.
http://pickyrb.com
Other
443 stars 49 forks source link

Retrieving categories from index by ID? #124

Closed andyl closed 10 years ago

andyl commented 10 years ago

I've got a directory of text files, and I'd like to use Picky for a little command-line search app. I've progressed to the point where I can generate an index, do a search and get a list of IDs.

Once I have an ID, is there any way I can retrieve the categories for that object out of the index? I'd like to use the category info to populate the results list.

floere commented 10 years ago

Not sure what you mean exactly. I'm going to assume you'd like a representation of the object instead of just the id.

One simple idea is to store the representation you'd like in a hash:

representations = things.inject({}) do |result, thing|
  result[thing.id] = thing
  result
end

# Set up index and search here etc.

results = picky_search_interface.search ...

results.ids.map { |id| representations[id] } # => a list of things

Does that help?

floere commented 10 years ago

P.S: Or if you like Ruby funkyness, replace the last line with

representations.values_at *results.ids
andyl commented 10 years ago

OK I'll store the object data in the ID - I think that will work - thanks for your reply.

floere commented 10 years ago

@andyl Good luck! Was it hard getting to the point where you are now? Any suggestions for improvements – where did you get stuck?

andyl commented 10 years ago

Hi @floere - I built a working program using picky, and yes I did get stuck! Here's some of my experiences and ideas for improvement.

My App

I've got a directory with a few hundred text files. Within each text file, there are HTML-like start/end tags that delimit text snippets organized in a week > day > timestamp hierarchy.

I want to be able to search for text snippets from within vim - using a search interface similar to that provided by ack.vim. My starting point was to write simple index & search programs that work in the command line.

I wrote a parser (using parslet) to extract the snippets, allowing me to generate a record for each snippet which contained the following categories: file_name, week_start, day, timestamp, title, text, start_line. For the ID, I constructed a text string "#{file_name}/#{start_line}/#{title}". These records were fed into Picky to generate the index. Then I wrote command-line script that performed a search and returned a list of ID's.

My Frustrations

All I wanted was a little command-line script, but the docco has stuff about sinatra servers, web clients and javascript front ends mixed throughout.

Many of the critical docs were empty. For example, I wanted to save the index to a file, and the docco just says "TODO" (or some such). The only way I was able to get the app working was to read the source and introspect with PRY.

After a lot of head-banging, I looked for alternative Ruby search engines, and could find none. There's stuff based on Lucene, but that is overkill. IMHO Ruby needs a small / lightweight / simple search engine. I wanted a Ruby-embeddable search engine that was as simple to use as 'ack' or 'silver surfer'.

It looks like you put a ton of work building a great search engine, but for me, Picky could use a lot of simplification.

Ideas for Simplification

1) Remove all the sinatra server/client stuff - extract into a separate app.

2) Finish the docco.

3) Provide simple out-of-the-box command-line example apps for common scenario: searching CSV, JSON and XML files.

4) Do tests with new users. Benchmark against ack and silver-surfer. The new user should be able to go from standing start to working search in 3-5 minutes.

Fini

Well, that is probably more than you wanted. :-) I hope my notes were useful. Thanks for Picky !!

floere commented 10 years ago

@andyl Wow! Thanks so much for your description – this is very, very helpful indeed :)

For now, just a quick question re 4) – I guess the simple example on the web page http://pickyrb.com/ did not help very much?

andyl commented 10 years ago

Hi @floere - re 4) - I was confused by the 'Got 5 minutes' and 'Got 2 minutes' paths. I chose incorrectly, and wasted a ton of time on the sinatra example. Suggestion: remove the 5 minute example.

The 2 minute code example was very helpful. Here's a a gist which I think would be even clearer:

https://gist.github.com/andyl/4660c0fa2ff42f269e0f

floere commented 10 years ago

Thanks a lot – I'll have to be even clearer about the choice there, perhaps a tab with Just Ruby/Sinatra and two examples. But for now I used your example: http://pickyrb.com/. Thanks a lot!

andyl commented 10 years ago

Very nice. I'll try to contribute another example script next week. (CSV search)