apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.64k stars 1.03k forks source link

Fully embrace the java module system [LUCENE-10255] #11291

Open asfimport opened 2 years ago

asfimport commented 2 years ago

I've experimented a bit trying to move the code to the JMS. It is {}surprisingly difficult{}... A PoC that almost passes all checks is here: -https://github.com/dweiss/lucene/tree/jms- https://github.com/dweiss/lucene/tree/jms2

Here are my conclusions so far:


Migrated from LUCENE-10255 by Dawid Weiss (@dweiss), updated Feb 01 2022 Attachments: screenshot-1.png, screenshot-2.png Linked issues:

Sub-tasks:

asfimport commented 2 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

From the list posted before there is also an example made by you: "com.carrotsearch.hppc" is module name of the maven artifact "com.carrotsearch:hppc". This was exactly also my proposal for Lucene:

https://github.com/carrotsearch/hppc/blob/29ab369adac23a76acae1d08529654b2c2dc59e5/gradle/java/compiler.gradle#L24

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Mistakes of the youth... I remain unconvinced.

asfimport commented 2 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Apache TIKA also uses module names according to the spec: https://github.com/apache/tika/blob/9d29536228860860549d89a052673d47c2af75ca/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-xmp-commons/pom.xml#L48

I just figured out that you already added Automatic Module names to the 9.0 release, which are not even hardcoded, but derived through regular expressions/search-replace from the internal gradle project path. This was done completely without any announcement, so we have a Lucene release with broken names going out soon. I am glad that nobody takes care about modules at the moment...

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Uwe... This was done with an announcement on the pull request and the issue. And it's also literally everywhere in the scripts you've reviewed ("-m lucene.luke").

If you really care so much about it and wish to change it to a full prefix we can still do it - 9.0 is not out yet.

asfimport commented 2 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

I just figured out that you already added Automatic Module names to the 9.0 release, which are not even hardcoded, but derived through regular expressions/search-replace from the internal gradle project path.

This is even more risky if we decide to remove the ":lucene" top level Gradle folder, then the module name changes and nobody will notice!

Everything that's relevant to source code of downstream users should be explicitly declared (either in module-info.java or in the manifest).

asfimport commented 2 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

This was done with an announcement on the pull request

These are so important changes that it should have been a post on mailing list!

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

11270

> These are so important changes that it should have been a post on mailing list!

Sure. It wasn't a change though - it was an introduction of what wasn't there before at all.

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Sorry but I remain unconvinced that typing a million times "org.apache." in various contexts wins you or me anything.

Sorry for the silly question, but I'm trying to understand why you'd need to type it a million times. I too dislike the verbosity of java, but it is my understanding that you might only add it to a module-info.java, like once?

I think as far as specifying stuff on the commandline, its not a problem, as lucene isn't a commandline application but instead an API. The one app we really ship (luke) has a sh/bat to make it easy.

But because it is an API, I do care that it's easy for users to consume it with the module system. And that also includes making it easy to consume things like analyzers via SPI providers if they are using the module system. I just don't know what that looks like yet (due to my unfamiliarity with the module system), but I'd love to visually see the tradeoffs between say 'lucene.analysis.common' and 'org.apache.lucene.analysis.common' from an "API user" perspective.

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I already typed "-m lucene.luke" what seems like half a million times while debugging stuff around the jms and gradle bugs. So I'm almost there.

Listen... I really don't like the full prefix but I really could care less about it if you all want to stick with the full domain name - let's just fix it, respin the release candidate and be done with it. I did announce the shorthand version on #11270, perhaps I should have written an all-caps announcement but I didn't, sorry. Let's do it the way you like it, I really don't care THAT MUCH. I only care a little.

asfimport commented 2 years ago

Robert Muir (@rmuir) (migrated from JIRA)

my comment was a genuine question, as I don't yet understand how annoying this name will be to API users. I don't yet have any opinion on the color of the bikeshed :)

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

It'll be a different prefix in module-info.java "requires xyz" statements and in command-line invocations of Luke. Also, it'll list Lucene module as "lucene.core@version" instead of "org.apache.lucene.core@version". I'll provide a PR to go back to the full-prefix - Uwe seems to be really determined that this is the right way (tm) of doing it.

asfimport commented 2 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Sorry for the silly question, but I'm trying to understand why you'd need to type it a million times. I too dislike the verbosity of java, but it is my understanding that you might only add it to a module-info.java, like once?

Exactly. And for our API users it is not understandable why you must write in the modudle-info.java "requires lucene.core;" but in all java files "import org.apach.lucene.xyz.*;". This is inconsistent! And there is the risk of clashes (although Lucene is very special, but we will see other third party modules then also name their modules like "lucene.foobar.xy", although they have nothing in common with Apache. We are an Apache project, so our package names, module names and maven artifact names should have the "org.apache.lucene" prefix.

This allows to consume in the way everybody knows: In java files for imports and when definig your dependencies in Maven or the requires directoives in Java modules.

But because it is an API, I do care that it's easy for users to consume it with the module system. And that also includes making it easy to consume things like analyzers via SPI providers if they are using the module system.

Yes, and this will also work with module system. I tested it after adding correct "uses SPIBaseClass" statements to lucene-core's module-info.java. Theoretically, in addition we can hide everything from analyzers-common except the SPI....

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I accept your arguments, even if I disagree with them, Uwe. I provided a PR to change it already.

asfimport commented 2 years ago

Christian Stein (migrated from JIRA)

FWIW, I agree with Uwe on the naming topic and want to add "prior art" samples from other org.apache.\* project already shipping as Java modules with their module names with org.apache.: Derby, Felix, POI, Tomcat, and Wicket.

https://github.com/sormuras/modules/blob/be524907f29f60c7895b3cde62850a1937969ad7/com.github.sormuras.modules/com/github/sormuras/modules/modules.properties#L2480-L2551

asfimport commented 2 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Thanks Christian Stein for confirmation.

We now have the following names generated from the gradle build:

> Task :showModuleNames
lucene-benchmark-10.0.0-SNAPSHOT.jar               -> org.apache.lucene.benchmark
lucene-backward-codecs-10.0.0-SNAPSHOT.jar         -> org.apache.lucene.backward_codecs
lucene-classification-10.0.0-SNAPSHOT.jar          -> org.apache.lucene.classification
lucene-codecs-10.0.0-SNAPSHOT.jar                  -> org.apache.lucene.codecs
lucene-core-10.0.0-SNAPSHOT.jar                    -> org.apache.lucene.core
lucene-demo-10.0.0-SNAPSHOT.jar                    -> org.apache.lucene.demo
lucene-expressions-10.0.0-SNAPSHOT.jar             -> org.apache.lucene.expressions
lucene-facet-10.0.0-SNAPSHOT.jar                   -> org.apache.lucene.facet
lucene-grouping-10.0.0-SNAPSHOT.jar                -> org.apache.lucene.grouping
lucene-highlighter-10.0.0-SNAPSHOT.jar             -> org.apache.lucene.highlighter
lucene-join-10.0.0-SNAPSHOT.jar                    -> org.apache.lucene.join
lucene-luke-10.0.0-SNAPSHOT.jar                    -> org.apache.lucene.luke
lucene-memory-10.0.0-SNAPSHOT.jar                  -> org.apache.lucene.memory
lucene-misc-10.0.0-SNAPSHOT.jar                    -> org.apache.lucene.misc
lucene-monitor-10.0.0-SNAPSHOT.jar                 -> org.apache.lucene.monitor
lucene-queries-10.0.0-SNAPSHOT.jar                 -> org.apache.lucene.queries
lucene-queryparser-10.0.0-SNAPSHOT.jar             -> org.apache.lucene.queryparser
lucene-replicator-10.0.0-SNAPSHOT.jar              -> org.apache.lucene.replicator
lucene-sandbox-10.0.0-SNAPSHOT.jar                 -> org.apache.lucene.sandbox
lucene-spatial-extras-10.0.0-SNAPSHOT.jar          -> org.apache.lucene.spatial_extras
lucene-spatial3d-10.0.0-SNAPSHOT.jar               -> org.apache.lucene.spatial3d
lucene-suggest-10.0.0-SNAPSHOT.jar                 -> org.apache.lucene.suggest
lucene-test-framework-10.0.0-SNAPSHOT.jar          -> org.apache.lucene.test_framework
lucene-analysis-common-10.0.0-SNAPSHOT.jar         -> org.apache.lucene.analysis.common
lucene-analysis-icu-10.0.0-SNAPSHOT.jar            -> org.apache.lucene.analysis.icu
lucene-analysis-kuromoji-10.0.0-SNAPSHOT.jar       -> org.apache.lucene.analysis.kuromoji
lucene-analysis-morfologik-10.0.0-SNAPSHOT.jar     -> org.apache.lucene.analysis.morfologik
lucene-analysis-nori-10.0.0-SNAPSHOT.jar           -> org.apache.lucene.analysis.nori
lucene-analysis-opennlp-10.0.0-SNAPSHOT.jar        -> org.apache.lucene.analysis.opennlp
lucene-analysis-phonetic-10.0.0-SNAPSHOT.jar       -> org.apache.lucene.analysis.phonetic
lucene-analysis-smartcn-10.0.0-SNAPSHOT.jar        -> org.apache.lucene.analysis.smartcn
lucene-analysis-stempel-10.0.0-SNAPSHOT.jar        -> org.apache.lucene.analysis.stempel

(see https://github.com/apache/lucene/pull/487)

At the moment it is automatic module names, but this issue is about fully modularizing.

asfimport commented 2 years ago

Robert Scholte (@rfscholte) (migrated from JIRA)

TLDR; unlike Maven, which has a GroupId and ArtifactId for uniquely identifying a versionless artifact, Java has the package to uniquely identify a class and a module name to uniquely identify a modular jar. Keeping these in sync will implicitly prevent you from split package issues. Prevent situations like https://qz.com/646467/how-one-programmer-broke-the-internet-by-deleting-a-tiny-piece-of-code/ and ensure there will never be any doubt about the module name, so my advice: start with org.apache.lucene

asfimport commented 2 years ago

Uwe Schindler (@uschindler) (migrated from JIRA)

Hi @rfscholte, thank for the advice!

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I've added an automatic check verifying the consistency between services provided by the modular and classpath layer. It works fine - currently fails with:

org.apache.lucene.distribution.TestModularLayer > test suite's output saved to C:\Work\apache\lucene\lucene\distribution-tests\build\test-results\test\outputs\OUTPUT-org.apache.lucene.distribution.TestModularLayer.txt, copied below:
   >     java.lang.AssertionError: [Modular providers of service org.apache.lucene.analysis.TokenFilterFactory in module: org.apache.lucene.analysis.common]
   >     Expecting TreeSet:
   >       ["org.apache.lucene.analysis.ar.ArabicNormalizationFilterFactory",
...
   >     but could not find the following element(s):
   >       ["org.apache.lucene.analysis.es.SpanishPluralStemFilterFactory"]

which is exactly right.

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

Finally! I think we have all the pieces that are needed to introduce proper java module support in Lucene. The change basically folds down to:

All of these things are working and validated in the draft PR at https://github.com/apache/lucene/pull/470 - thanks everyone for their support and persistence.

I will clean up the repetitive parts of the patch, clean up the unnecessary or experimental unrelated changes and will create a cleaned up PR soon. I think now, that everything is working, it should be downhill.

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I've reorganized a lot of work, did the cleanups and discovered new problems along the way. A cleaned-up branch is here:

https://github.com/dweiss/lucene/tree/jms2 https://github.com/apache/lucene/pull/533

I may be reorganizing the commit graph from time to time (force-pushes) so that I can make each commit a reasonably large-scoped thing that fixes one thing. Currently they're fairly separate already.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit e0745c7b24b392f2657e207c45031238e2f5289a in lucene's branch refs/heads/main from Dawid Weiss https://gitbox.apache.org/repos/asf?p=lucene.git;h=e0745c7

LUCENE-10255: re-add utilities for debugging packages and services. These are not included by default to avoid unnecessary compilation overhead.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit d42db56babfe1bb93a5f34b064bfa11056716812 in lucene's branch refs/heads/main from Dawid Weiss https://gitbox.apache.org/repos/asf?p=lucene.git;h=d42db56

LUCENE-10255: initial support for Java Modules.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit c1c27d4ff409ee514f1681207c2fec0dacd54c3c in lucene's branch refs/heads/branch_9x from Dawid Weiss https://gitbox.apache.org/repos/asf?p=lucene.git;h=c1c27d4

LUCENE-10255: initial support for Java Modules (squashed).

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 7e1f3fef699376cc6069a31a5498670080741e98 in lucene's branch refs/heads/branch_9x from Dawid Weiss https://gitbox.apache.org/repos/asf?p=lucene.git;h=7e1f3fe

LUCENE-10255: add unsynced providers to the module.

asfimport commented 2 years ago

Dawid Weiss (@dweiss) (migrated from JIRA)

I filed this simple example in the gradle issue tracker - it shows how even a simple project can be made to fail with modular setup there.

https://github.com/gradle/gradle/issues/19376

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit f103cca5652dc3aabcd469fd5f007c6828b3c695 in lucene's branch refs/heads/main from Dawid Weiss https://gitbox.apache.org/repos/asf?p=lucene.git;h=f103cca

LUCENE-10255: Add the required unnamed modules in benchmarks subproject to module-info so that they are explicit.