apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.6k stars 1.01k forks source link

Add a demo search server [LUCENE-5376] #6440

Closed asfimport closed 8 years ago

asfimport commented 10 years ago

I think it'd be useful to have a "demo" search server for Lucene.

Rather than being fully featured, like Solr, it would be minimal, just wrapping the existing Lucene modules to show how you can make use of these features in a server setting.

The purpose is to demonstrate how one can build a minimal search server on top of APIs like SearchManager, SearcherLifetimeManager, etc.

This is also useful for finding rough edges / issues in Lucene's APIs that make building a server unnecessarily hard.

I don't think it should have back compatibility promises (except Lucene's index back compatibility), so it's free to improve as Lucene's APIs change.

As a starting point, I'll post what I built for the "eating your own dog food" search app for Lucene's & Solr's jira issues http://jirasearch.mikemccandless.com (blog: http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It uses Netty to expose basic indexing & searching APIs via JSON, but it's very rough (lots nocommits).


Migrated from LUCENE-5376 by Michael McCandless (@mikemccand), 1 vote, resolved Jul 29 2016 Attachments: lucene-demo-server.tgz

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

I'm attaching the current sources (tgz archive)... they are standalone now but to add it into Lucene I think we should put it under lucene/demo or lucene/server or something.

It uses "custom" (Python) build scripts, because I became frustrated with ant; after extracting, python3 build.py test should run the tests.

These are just the sources for the server side of the http://jirasearch.mikemccandless.com app.

There are many issues to fix, e.g. cut back to ant (there are some old ant build scripts there), use only one JSON parser (it uses two now), but it does support a number of basic indexing/search APIs: add/update document/s, bulk add/update documents, suggest, search/After, block joins, highlighting, live field values, snapshots, basic index statistics (for diagnostics).

It has limited support for "plugins", but I'm tempted to remove that before committing. The only plugin it has now is Tika, to crack binary documents into text for indexing.

asfimport commented 10 years ago

Han Jiang (@sleepsort) (migrated from JIRA)

+1, it will be great to have an 'active' demo to show the features :)

I think we should remove those hardcoded classpaths, e.g. in post.py:30?

And will this demo be expected to be the same as jirasearch? Will we need further configuration to get the demo webside working? For example I cannot find search.py in the sourcecodes.

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

Thanks Han.

I think we should remove those hardcoded classpaths, e.g. in post.py:30?

Good catch, I'll fix that ... that's a minimal example of how to issue commands to the server to create an index and register a few fields, from a Python client.

And will this demo be expected to be the same as jirasearch? Will we need further configuration to get the demo webside working? For example I cannot find search.py in the sourcecodes.

These sources are just for the server side; I didn't include the jirasearch UI/indexing sources. But I agree it would be useful to have that too, i.e. an example search app/UI that runs against this server. I'll think about how to fold it in ...

asfimport commented 10 years ago

Yonik Seeley (@yonik) (migrated from JIRA)

I think there are plenty of lucene-based search servers already in existence... We shouldn't bloat lucene/solr even further by adding yet another. Something like this belongs as a separate project (collaborate on github with whoever else wants to build/maintain this).

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553272 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553272

LUCENE-5376: mike's tgz as-is to branch

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553275 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553275

LUCENE-5376: fixup the ant/ivy a bit

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553282 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553282

LUCENE-5376: fix compile

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553283 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553283

LUCENE-5376: add svn:ignore

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553287 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553287

LUCENE-5376: get tests passing

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553288 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553288

LUCENE-5376: use consistent dependencies versions, dont download the internet

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553289 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553289

LUCENE-5376: clean up some forbidden apis

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553291 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553291

LUCENE-5376: get ant javados passing

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553294 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553294

LUCENE-5376: fix documentation-lint, rat-sources, remove python build, remove plugin (it can be a separate module?)

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553297 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553297

LUCENE-5376: minimize jdocs deps, nuke unused imports

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553300 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553300

LUCENE-5376: clean up forbidden system outs / logging

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553301 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553301

LUCENE-5376: remove code duplication

asfimport commented 10 years ago

Shai Erera (@shaie) (migrated from JIRA)

I think we should put it under lucene/demo or lucene/server or something.

I think we should put it under lucene/server. We can have the JIRA search example (or a simpler one) under demo if we really want to, but I think a LuceneServer is a good component by itself. First, we never know what it will turn out into, what APIs will be developed etc. Also, it would be nice to see clean examples of distributed search (simple, facet, spatial) as well as suggest (maybe even distributed suggest). So +1 for adding it as a new module.

asfimport commented 10 years ago

Areek Zillur (@areek) (migrated from JIRA)

This is a good idea. I was thinking of having a suggest demo in the lucene demo package! But this would be a good place to put it. I will look into the suggest stuff after a week or so.

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553379 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553379

LUCENE-5376: try to address nocommit...

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553384 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553384

LUCENE-5376: remove crazy use of jackson for pretty-printing. jackson only used in the indexing api now

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I created 'lucene5376' branch (https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5376) with the patch and cleaned up the build and so on.

notes/questions:

asfimport commented 10 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

i removed the tika plugin (for now). is this ok? we can make it a separate module?

That's fine ... let's leave it for later?

which json parser to keep? jackson is only used rarely, i assume this is the one to remove? We just have to change the indexing code now to remove those jars.

I had to add jackson for streaming / incremental parsing for the bulk APIs ... I think (not sure!) that we should just move to jackson for everything, but I haven't started that yet. I'll look into it.

forbidden-apis was angry about the SOPs. i tried to figure out which ones were intentional and which ones were not. you can see the list of files exempted in build.xml.

Thanks for fixing!

javadocs-lint is working but angry because mostly all javadocs (package.html's, class docs) are missing. this seems to need some help.

I'll work on this.

TestPlugins is disabled, but i think we can fix it if we build a .zip on-the-fly, pulling in MockPlugin.class and other files with .getResourceAsStream? Uwe will love this.

That sounds great!

Thanks Rob.

asfimport commented 10 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I had to add jackson for streaming / incremental parsing for the bulk APIs ... I think (not sure!) that we should just move to jackson for everything, but I haven't started that yet. I'll look into it.

OK, I may have sent this in the wrong direction with my latest commit, so just back it out if you want to move to jackson. I assumed the opposite, since jackson was only used in a few places. I will say I do prefer the simpler underengineered api of the json-smart vs the ... no comment... jackson, even if it has less features. Reminds me of HttpURLConnection vs httpclient.

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553403 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553403

LUCENE-5376: fix and re-enable test

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553477 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553477

LUCENE-5376: javadocs

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553486 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553486

LUCENE-5376: javadocs fixes

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553499 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553499

LUCENE-5376: move the hack wtf into one place

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553500 from @rmuir in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553500

LUCENE-5376: don't create and destroy dirs in the CWD

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553670 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553670

LUCENE-5376: javadocs

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1553821 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1553821

LUCENE-5376: javadocs

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1554012 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1554012

LUCENE-5376: documentation-lint finally passes (method level)

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1554013 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1554013

LUCENE-5376: simplify sync'd code

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1554020 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1554020

LUCENE-5376: clean up / document addDocument code

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1554022 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1554022

LUCENE-5376: rename singleValued -> multiValued

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1554207 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1554207

LUCENE-5376, #6315: expose DocumentDictionary (to build suggestor from stored documents) in demo server

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1554409 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1554409

6358, LUCENE-5376: in Lucene demo server, support building suggester where weight is an expression

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1555341 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1555341

LUCENE-5376: test sugar: insert the current index name to outgoing requests to make writing tests easier

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1555629 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1555629

LUCENE-5376: sync trunk changes, cutover to new facets APIs, simplify search handler

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1555726 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1555726

LUCENE-5376: consolidate logic to pull the searcher from indexGen, version, snapshot or current

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1556045 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1556045

5860, LUCENE-5376: expose SortedSetDocValuesFacets in lucene server

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1556508 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1556508

6271, LUCENE-5376: add expressions support to lucene server, so you can define a virtual field from any JS expression and then sort by that field or retrieve its values for all hits

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1556546 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1556546

LUCENE-5376: turn on scoring when sorting by field if any of the sort fields or retrieved fields require scores, e.g. when they are an expression field that uses _score

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1556555 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1556555

LUCENE-5376: don't need to make ScoreValueSource public

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1556564 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1556564

LUCENE-5376: add another expression test case; add nocommit for bcp47 cutover

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1556620 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1556620

LUCENE-5376: also allow dynamic expression per-request

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1556627 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1556627

LUCENE-5376: remove recency blending hack: just use expressions instead

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1556775 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1556775

LUCENE-5376: allow setting norms format, including compressed norms

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1556786 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1556786

5971, LUCENE-5376: using the expert 'render to Object' APIs in PostingsHighlighter to render directly to JSONArray in lucene server

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1557073 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1557073

6400, LUCENE-5376: expose SimpleQueryParser in lucene server

asfimport commented 10 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1557517 from @mikemccand in branch 'dev/branches/lucene5376' https://svn.apache.org/r1557517

LUCENE-5376: expose control over underlying facet index field, fix some nocommits