apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.64k stars 1.02k forks source link

consolidate all api modules in one place and un!@$# packaging for 4.0 [LUCENE-3965] #5038

Closed asfimport closed 12 years ago

asfimport commented 12 years ago

I think users get confused about how svn/source is structured, when in fact we are just producing a modular build.

I think it would be more clear if the lucene stuff was underneath modules/, thats where our modular API is.

we could still package this up as lucene.tar.gz if we want, and even name modules/core lucene-core.jar, but i think this would be a lot better organized than the current:

confusion.


Migrated from LUCENE-3965 by Robert Muir (@rmuir), resolved Apr 19 2012 Attachments: LUCENE-3965_module_build_pname.patch, LUCENE-3965_module_build.patch, LUCENE-3965.patch (versions: 4)

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

i think from release artifacts perspective, this would make a lot of sense: you would unzip and see:

So people wouldnt be confused about where to go find stuff.

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

So top-level lucene/ directory would vanish?

Solr would not be affected?

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

So top-level lucene/ directory would vanish?

In my opinion, yes. and contrib/highlighter would sit under there too.

so instead of what you have today (which we dont even know how to package!), when you unzip lucene.zip you would see:

(i just combined the modules across lucene/, lucene/contrib, modules, and alpha-sorted so you have an idea of what it looks like)

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

and what about lucene contribs? all promoted to be modules?

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Right, i guess if there is something funky about them and we don't think they belong as a top-level module, then stuff can always go in the sandbox?

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

btw: I'm just bringing this up as an idea to go towards addressing the 4.0 packaging, in my opinion it makes sense and is simple. There might be other solutions too though.

But truth be told, now is a GREAT time to figure this out as we look at putting 3.x in bugfix mode. because we can fix this layout to be organized the way we want and not pay the price of difficult svn merging.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

some inspiration from ICU: http://source.icu-project.org/repos/icu/icu4j/trunk/main/classes/

They actually combine these all into one mega-jar still as they work towards modularization, but internally this is a similar thing there.

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

btw: I'm just bringing this up as an idea to go towards addressing the 4.0 packaging, in my opinion it makes sense and is simple. There might be other solutions too though.

I guess it's simpler because instead lucene/ and its denizens (which we already know and love), as well as modules/ (no packaging clue, thank you very much), the problem is reduced to the one single great unknown.

But truth be told, now is a GREAT time to figure this out as we look at putting 3.x in bugfix mode. because we can fix this layout to be organized the way we want and not pay the price of difficult svn merging.

Yes, if we are going to restructure, we should do it now.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

an alternative thats still the same basic proposal is to move the current modules/ underneath lucene/ (maybe thats less confusing? as then you see our two "products" lucene/ and solr/ from the svn-tree).

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I guess it's simpler because instead lucene/ and its denizens (which we already know and love), as well as modules/ (no packaging clue, thank you very much), the problem is reduced to the one single great unknown.

Well I started thinking about this when you restructured the lucene/ to have "modules" underneath it like "core", "test-framework", "tools"... it starts making it painfully obvious we should combine this stuff in some simple flattened structure that makes sense.

as far as SVN call it modules/, call it lucene/, I don't care. its our search API product.

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

an alternative thats still the same basic proposal is to move the current modules/ underneath lucene/ (maybe thats less confusing? as then you see our two "products" lucene/ and solr/ from the svn-tree).

Like this? (i.e. everything under modules/, but modules/ under lucene/:

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

another idea: instead of having analysis/ with "submodules" underneath it, we could flatten that too (like solr-dataimporthandler and dataimporthandler-extras)

so we would have analysis-common, analysis-kuromoji, analysis-phonetic, etc.

Not sure if this really makes things simpler, but its flat. We don't have to do it, but maybe it could simplify the build and such to have this easy flat structure.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Like this? (i.e. everything under modules/, but modules/ under lucene/:

If we put it under lucene/ I would propose we wouldnt move core at all.

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

Like this? (i.e. everything under modules/, but modules/ under lucene/:

If we put it under lucene/ I would propose we wouldnt move core at all.

  • lucene/
    • core/
    • demo/
    • highlighter/
    • analyzers/
    • grouping/
    • test-framework ...

+1

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

i like this better too... quick iteration :)

As far as the analyzers being 'nested' or 'flat' we could address that separately, i could go either way. But i think its much simpler to have at least our high level modules all in one place... thats really the point of this issue (title is misleading now)

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

So here's the current iteration:

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

another idea: instead of having analysis/ with "submodules" underneath it, we could flatten that too (like solr-dataimporthandler and dataimporthandler-extras)

so we would have analysis-common, analysis-kuromoji, analysis-phonetic, etc.

Not sure if this really makes things simpler, but its flat. We don't have to do it, but maybe it could simplify the build and such to have this easy flat structure.

+0 - while the current analysis sub-module structure only serves to conceptually group them, rather than provide any technical benefit, I think we may want sub-modules in the future, perhaps for technical reasons, but also to get a handle on the human chunking limit: more than 5-9 or so "things" in one "place" and people's eyes glaze over...

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

That was my concern too. Currently I'm not sure this harms anything, and its well organized.

Additionally we have quite a few modules underneath analysis now, growing fast actually. So it could cause a mess in the future and i'm not sure any simplicity to the build would actually be worth it.

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

Do you think we should keep one build/ directory per new-style module? I rather like the current ant clean under lucene/ - boom, one directory, done.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

well we never had one build/ directory right?

At least contrib modules build underneath lucene/'s build.

The only reasons modules have their own build/'s is because they go out of their way to do this! So I agree with you, lets just nuke that and have one!

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

That would also simplify packaging, as modules' currently go out of their way to make their own dist/ directories too: e.g. under analyzers/common:

  <property name="build.dir" location="../build/common" />
  <property name="dist.dir" location="../dist/common" />

Same goes with licensing (they have their own LICENSE.txt/NOTICE.txt's). If the products are still going to be lucene/ and solr/ (and i think for simplicity for 4.0, that's really what it should be) then we don't need this.

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

If the products are still going to be lucene/ and solr/ (and i think for simplicity for 4.0, that's really what it should be)

+1

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

editing title to be more general...

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I don't like the separation between Solr and Lucene, in my opinion, Solr should also be a module and the lucene dir vanished. Solr contribs should also be modules.

But, the current solution is also fine, so +1

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

I don't like the separation between Solr and Lucene, in my opinion, Solr should also be a module and the lucene dir vanished. Solr contribs should also be modules.

I agree with Robert: one top-level dir per "product" makes sense to me.

asfimport commented 12 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I would be more happy, if e.g. the Solr Tokenizer Factories would be part of the analysis modules... So a equalness between lucene-core and solr would be fine.

But on the other hand, factoring out the factories completely from Solr might be a good idea on the way to compoletely dynamic analyzer definitions like in Solr (see Hibernate Search, where you can define your Analyzer using Java Annotations – that internally usre Solr's factories and import the solr.jar uselessly). Thats just a comment on the side, I just wanted to mention it. So the current solution is fine, given that we remove Factories from Solr and move them to the analysis modules (and add the abstract interface to Lucene core).

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I would be more happy, if e.g. the Solr Tokenizer Factories would be part of the analysis modules...

But how is that related to this proposal?

here I am just talking about consolidating what we currently have on the filesystem so its less confusing.

Separately, I happen to agree with you, but I can assure you nothing will happen with regards to that on this issue, why don't you assign or work on #3584?

asfimport commented 12 years ago

Chris Male (migrated from JIRA)

+1 to this consolidation effort. I like the latest iteration layout.

I also agree with Steve that we should continue to support sub-modules. The new layout already has a lot of modules under lucene/ so I think its good to continue to keep the analysis submodules under analysis/.

This whole process means we can improve the demo module more, so that it actually demos all the other modules in some way.

Right, i guess if there is something funky about them and we don't think they belong as a top-level module, then stuff can always go in the sandbox?

+1. We should go over the remaining contribs as we did in the past and make decisions about whether they're module worthy.

asfimport commented 12 years ago

Michael McCandless (@mikemccand) (migrated from JIRA)

+1, to moving/merging modules/* and lucene/contrib/* under lucene. This is much cleaner.

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

A good first step would be to bring all Lucene contribs down to the same level as core/ and test-framework/; make a new module-build.xml that's basically a copy of contrib-build.xml, and then make all the "internal" modules switch to module-build.xml.

Moving modules/* and getting rid of contrib-build.xml could come later.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

we could also just rename contrib-build to that, but keep its 'project name' the same so that its just a filesystem thing but all tasks still work.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

here's the patch.

first you must 'svn move lucene/contrib/contrib-build.xml lucene/module-build.xml'

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

we could also just rename contrib-build to that, but keep its 'project name' the same so that its just a filesystem thing but all tasks still work.

here's the patch.

first you must 'svn move lucene/contrib/contrib-build.xml lucene/module-build.xml'

+1 - ant test from the top level works, as does ant dist from both lucene and solr.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

just to clean up: this renames the project name to 'module-build' to match and fixes references.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I'm gonna work on an svn move + script today.

We need to get this issue resolved soon so that packaging works: then i think we have a lot of options as far as improving the nightly builds and things like that.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

first take at a patch (works with r1326364).

don't fear the size of the patch, its mostly noise from svn moves (even though i used --no-diff-deleted, it still lists every file that was moved)

first you run this:

svn move lucene/contrib/demo lucene/demo
svn move lucene/contrib/highlighter lucene/highlighter
svn move lucene/contrib/memory lucene/memory
svn move lucene/contrib/misc lucene/misc
svn move lucene/contrib/sandbox lucene/sandbox
svn move modules/analysis lucene/analysis
svn move modules/benchmark lucene/benchmark
svn move modules/facet lucene/facet
svn move modules/grouping lucene/grouping
svn move modules/join lucene/join
svn move modules/queries lucene/queries
svn move modules/queryparser lucene/queryparser
svn move modules/spatial lucene/spatial
svn move modules/suggest lucene/suggest
svn delete modules

'ant test' and 'ant javadocs' and such works, but prepare-release etc need some help. Though: they didnt work before either :)

There are also some nocommits.

Still I'd like to get us in releasable shape with this issue... so I'm going to keep iterating. but its a fairly easy change...

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

basically the overall design here was to rename 'contrib-crawl' to 'modules-crawl'. This just excludes the 'core' modules so the build works the same as before:

<fileset dir="." includes="*/build.xml" excludes="build/**,core/**,test-framework/**,tools/**"/>

Then after fixing the jar-XXX/contrib-uptodate stuff it was up and going fast.

Other things to fix still:

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

updated patch for r1326380 cleaning up the javadocs-all task (the package assigning stuff), and removing some nocommits: nuking contrib-uptodate and using module-uptodate everywhere.

next up: nuke the custom build directories so everything is organized under lucene/build/<XYZ>

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

updated patch for r1326433:

Another TODO:

There is now a new script, since its important to nuke the svn:ignores for the previous bogus build directories:

# svn moves
svn move lucene/contrib/demo lucene/demo
svn move lucene/contrib/highlighter lucene/highlighter
svn move lucene/contrib/memory lucene/memory
svn move lucene/contrib/misc lucene/misc
svn move lucene/contrib/sandbox lucene/sandbox
svn move modules/analysis lucene/analysis
svn move modules/benchmark lucene/benchmark
svn move modules/facet lucene/facet
svn move modules/grouping lucene/grouping
svn move modules/join lucene/join
svn move modules/queries lucene/queries
svn move modules/queryparser lucene/queryparser
svn move modules/spatial lucene/spatial
svn move modules/suggest lucene/suggest
# nuke modules dir
svn delete modules
# clean up svn:ignore's, all modules should be consistent 
# under lucene/build now... so nuke this
svn pset svn:ignore pom.xml lucene/analysis
svn pset svn:ignore -F - \
lucene/facet \
lucene/grouping \
lucene/join \
lucene/queries \
lucene/queryparser \
lucene/spatial \
lucene/suggest << EOF
*.iml
pom.xml
EOF
svn pset svn:ignore -F - lucene/benchmark << EOF
temp
work
*.iml
pom.xml
EOF
# now apply patch
patch -p0 < LUCENE-3965.patch
asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

attached patch fixing maven too (still r1326433)

I can now do a full 'ant -Dversion=4.0.0 -Dgpg.key=322D7ECA prepare-release' which produces reasonable artifacts.

I think this is ready to commit, it makes our trunk theoretically releasable where it wasn't before.

I would keep the issue open to address more of the TODOs:

Here is the script you need:

# svn moves
svn move lucene/contrib/demo lucene/demo
svn move lucene/contrib/highlighter lucene/highlighter
svn move lucene/contrib/memory lucene/memory
svn move lucene/contrib/misc lucene/misc
svn move lucene/contrib/sandbox lucene/sandbox
svn move modules/analysis lucene/analysis
svn move modules/benchmark lucene/benchmark
svn move modules/facet lucene/facet
svn move modules/grouping lucene/grouping
svn move modules/join lucene/join
svn move modules/queries lucene/queries
svn move modules/queryparser lucene/queryparser
svn move modules/spatial lucene/spatial
svn move modules/suggest lucene/suggest
# nuke modules dir
svn delete modules
# clean up svn:ignore's, all modules should be consistent 
# under lucene/build now... so nuke this
svn pset svn:ignore pom.xml lucene/analysis
svn pset svn:ignore -F - \
lucene/facet \
lucene/grouping \
lucene/join \
lucene/queries \
lucene/queryparser \
lucene/spatial \
lucene/suggest << EOF
*.iml
pom.xml
EOF
svn pset svn:ignore -F - lucene/benchmark << EOF
temp
work
*.iml
pom.xml
EOF
# maven configurations
svn move dev-tools/maven/modules/analysis dev-tools/maven/lucene
svn move dev-tools/maven/modules/benchmark dev-tools/maven/lucene
svn move dev-tools/maven/modules/facet dev-tools/maven/lucene
svn move dev-tools/maven/modules/grouping dev-tools/maven/lucene
svn move dev-tools/maven/modules/join dev-tools/maven/lucene
svn move dev-tools/maven/modules/queries dev-tools/maven/lucene
svn move dev-tools/maven/modules/queryparser dev-tools/maven/lucene
svn move dev-tools/maven/modules/spatial dev-tools/maven/lucene
svn move dev-tools/maven/modules/suggest dev-tools/maven/lucene
svn delete dev-tools/maven/modules
svn move dev-tools/maven/lucene/contrib/demo dev-tools/maven/lucene
svn move dev-tools/maven/lucene/contrib/highlighter dev-tools/maven/lucene
svn move dev-tools/maven/lucene/contrib/memory dev-tools/maven/lucene
svn move dev-tools/maven/lucene/contrib/misc dev-tools/maven/lucene
svn move dev-tools/maven/lucene/contrib/sandbox dev-tools/maven/lucene
svn delete dev-tools/maven/lucene/contrib

# now apply patch
patch -p0 < LUCENE-3965.patch
asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I plan to commit this later today.

asfimport commented 12 years ago

Steven Rowe (@sarowe) (migrated from JIRA)

+1 to commit - good progress!

Tests work from the top level, and I tried ant test in a couple of modules' directories, which also worked. Build output all seems to be going to the right place (under lucene/build/).

I scanned the changed build files, and I didn't see any problems.

I searched \*build.xml for "modules/" and "contrib". "modules/" seems to be gone, but there are several names that still have "contrib" in them (e.g. test-contrib) in lucene/build.xml and lucene/common-build.xml. These names can be fixed later.

I didn't look at javadocs or packaging - I assume anything you've done there will be better than it was :).

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I searched *build.xml for "modules/" and "contrib". "modules/" seems to be gone, but there are several names that still have "contrib" in them (e.g. test-contrib) in lucene/build.xml and lucene/common-build.xml. These names can be fixed later.

Yeah I can rename these (at least temporarily) to something more appropriate after committing.

separately another TODO: its slightly confusing that core and friends aren't just another module, and its confusing common-build.xml with its "module tasks" is included by lucene/build.xml, when its no module at all. (try ant -projecthelp from lucene/ and experiment with some of the common.XXXX tasks that show up here).

But, this is the way trunk is today :) The tradeoff of this approach is that we keep all the same logic and its really not a drastic change...

I resisted doing these kind of cleanups because they have a chance of breaking something and i think should be cleaned up separately... but we should still look at it afterwards.

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

running a final test first. I committed the fixes to nightly/

However, we could encounter a failed build from the svn check, due to the removal of the bogus build directories and their svn:ignores (would just be leftover relics).

Once i commit I will ask Uwe to clean the workspaces to (hopefully) prevent that, but its possible one build could slip through... (and of course the possibility i have some other bugs)

asfimport commented 12 years ago

Robert Muir (@rmuir) (migrated from JIRA)

I think this issue is ok!

We can build real releases now, so trunk is back in shape.