MITRECND / snugglefish

Simple NGram Fast Indexer & Searcher
37 stars 8 forks source link

Create Python module which wraps snugglefish #1

Closed imjonsnooow closed 10 years ago

imjonsnooow commented 11 years ago

Core changes:

  1. Create new Python module. Relevant changes are in new subdirectory: 'python'. Includes module code (pysnugglefish.cpp) and setup file.
  2. Create header file for snugglefish.
  3. Rename snugglefish's search and index functions to do_search and make_index. The previous names caused conflicts with other libraries.
  4. Add verbose flag to search method which enables/disables printing matched files during search.
Mraoul commented 11 years ago

Looks good, but some comments/changes:

Ngram size is currently required to be either 2 or 3, anything higher would take an obscene amount of memory and the alternative method/algorithm was never committed (it was extremely inefficient).

max_buffer and max_files have default values that are defined in a header file, 4GB and 2000 respectively if I recall correctly. Zero means use the defaults -- if it was no max for both, the process would never flush to disk.

Why is the file list semicolon separated? Why can't I just pass in an array?

I'd prefer if the index file path be a required field while init'ing and not changeable afterwards. Creating a new instance for a different file path should be trivial.

Don't add verbose to the search function, instead return that vector from that function and print out the results in main after line 247 [end of the inner else].

Also related to the above, copy out the ngram size check (line 304 in search) to main (238) and replace line 305 (the cout) with a return of the empty vector. This way all cout-ing is done outside of the search function.

imjonsnooow commented 11 years ago

@Mraoul I believe these are all set now.With max_buffer and max_files in this module, I believe the calls to the snugglefish code automatically use the defaults unless the user supplies a non-zero value.

wxsBSD commented 10 years ago

I took all of these changes, and cleaned up the code to be clean from warnings. It also now builds under 10.9. I've pushed the combined work to the py branch and will merge that in. As such, this can be closed. Thanks Shayne!