Open asfimport opened 6 years ago
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
It is better to start small, from the params. other refactors will follow and be part of the bigger Jira issue.
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
https://patch-diff.githubusercontent.com/raw/apache/lucene-solr/pull/369.patch
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
First draft patch attached.
Following tests have been successfully verified :
org.apache.lucene.classification.* org.apache.lucene.queries.mlt.* org.apache.solr.search.mlt.* org.apache.solr.handler.component.DistributedMLTComponentTest org.apache.solr.handler.component.MoreLikeThisComponentTest org.apache.solr.handler.MoreLikeThisHandlerTest
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
It is required to merge 12304 first. This patch is built on top of 12304.
Lucene/Solr QA (migrated from JIRA)
❌ -1 overall |
---|
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
-1 | patch | 0m 5s | SOLR-12299 does not apply to master. Rebase required? Wrong Branch? See https://wiki.apache.org/solr/HowToContribute#Creating_the_patch_file for help. |
Subsystem | Report/Notes |
---|---|
JIRA Issue | SOLR-12299 |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921792/SOLR-12299.patch |
Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/77/console |
Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
I am attaching the patch again. Apparently the one generated automatically from the github request was partially corrupted. Fixed now.
Lucene/Solr QA (migrated from JIRA)
❌ -1 overall |
---|
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
Prechecks | |||
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
master Compile Tests | |||
+1 | compile | 3m 28s | master passed |
Patch Compile Tests | |||
+1 | compile | 2m 52s | the patch passed |
+1 | javac | 2m 52s | the patch passed |
+1 | Release audit (RAT) | 0m 59s | the patch passed |
+1 | Check forbidden APIs | 0m 49s | the patch passed |
+1 | Validate source patterns | 0m 49s | the patch passed |
Other Tests | |||
+1 | unit | 0m 53s | classification in the patch passed. |
+1 | unit | 0m 36s | queries in the patch passed. |
-1 | unit | 72m 0s | core in the patch failed. |
84m 8s |
Reason | Tests |
---|---|
Failed junit tests | solr.cloud.ZkControllerTest |
Subsystem | Report/Notes |
---|---|
JIRA Issue | SOLR-12299 |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12921948/SOLR-12299.patch |
Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns |
uname | Linux lucene1-us-west 3.13.0-88-generic #135-Ubuntu SMP Wed Jun 8 21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | ant |
Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh |
git revision | master / 89fc02a |
ant | version: Apache Ant(TM) version 1.9.3 compiled on April 8 2014 |
Default Java | 1.8.0_172 |
unit | https://builds.apache.org/job/PreCommit-SOLR-Build/78/artifact/out/patch-unit-solr_core.txt |
Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/78/testReport/ |
modules | C: lucene/classification lucene/queries solr/core U: . |
Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/78/console |
Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
The jenkins output is a little bit suspicious ... I can not reproduce this test failure and it look completely unrelated to me... Any view on that guys?
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
I am quite keen in moving forward the More Like This refactor, can I help in any way to proceed with a review ? Anyone out there that could help ? I have already split up the big refactor to become more review-friendly, happy to do whatever it needs to push this forward ( I would like to proceed with further developments More Like This side, but first I want the refactor to be there)
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
Lucene/Solr QA (migrated from JIRA)
❌ -1 overall |
---|
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
-1 | patch | 0m 5s | LUCENE-8326 does not apply to master. Rebase required? Wrong Branch? See https://wiki.apache.org/lucene-java/HowToContribute#Contributing_your_work for help. |
Subsystem | Report/Notes |
---|---|
JIRA Issue | LUCENE-8326 |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924536/LUCENE-8326.patch |
Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/13/console |
Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
It is very annoying that the patch automatically generated from the Github Pull Request ( which is green to merge with the master) actually ends up being malformed...
I attach a fixed patch now, just built straight from command line. The Pull Request is still valid from a code review perspective.
Robert Muir (@rmuir) (migrated from JIRA)
Sorry, it may have been my fault. I checked "patch attached" when moving the issue to let the automated checks run
Robert Muir (@rmuir) (migrated from JIRA)
First looking at the API change, it would be good to understand the goals.
This change wraps 8 or 9 existing setters such as setMinTermLen
with a "configuration class". There is also another class related to boosts. But everything is still just as mutable as before, so from my perspective it only adds additional indirection/abstraction which is undesired.
If we want to make MLT immutable or something like that, we should first figure out if that's worth it. From my perspective, I'm not sold on this for MoreLikeThis itself, since its lightweight and stateless, and since I can't see a way for MoreLikeThisQuery to cache efficiently.
On the other hand MoreLikeThisQuery is kind of a mess, but that isn't addressed with the refactoring. Really all queries should be immutable for caching purposes, and should all have correct equals/hashcode: but seems like its a lost cause with MoreLikeThisQuery since it does strange stuff in rewrite: its not really a per-segment thing. Because of how the query works, its not obvious to me if/how we can improve it with immutability...
Also currently MoreLikeThisQuery doesn't accept MoreLikeThis as a parameter or anything, and only uses it internally. So as it stands (also with this patch) it still has a "duplicate" API of all the parameters, which isn't great.
So I think if we want to change the API for this stuff, we should figure out what the goals are? If its just to say, consolidate api between MoreLikeThis and MoreLikeThisQuery, I can buy into that (although I have never used the latter myself, only the former). However the other queries use builders for such purposes so that's probably something to consider.
For the solr changes, my only comment would be that instead of running actual queries, isn't it good enough to just test that XYZ configuration produces a correct MLT object? Otherwise the test seems fragile from my perspective.
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
Hi Robert,
first of all thanks for your feedback, much appreciated.
My initial goal was to make More Like This original 1000 lines class:
So I started identifying the different responsibilities in the More Like This class, to separate them, in the way that if I just need to change the scoring algorithm for the interesting terms I just touch the TermScorer ect ect : You see the overall idea in these slides : https://www.slideshare.net/AlessandroBenedetti/advanced-document-similarity-with-apache-lucene
I tried to modify as less as possible the logic and tests at this stage.
So let me wrap my considerations under different topics :
Parameters Abstraction I see your point for just additional indirection/abstraction ( it is exactly just that with readability in mind). My scope for this was : "We have 600 lines of code of default and parameters for the MLT. How to make them isolated, more readable and extendable ?" And I initially thought to just put them in a separate class to remove that responsibility from the original MLT . So the focus was exclusively better readability and easy maintenance at this stage. Can you elaborate why you think this is undesired ? I don't have any strong feeling regarding this bit, so I am open to suggestions with the forementioned objective in mind.
MLT Immutable I didn't consider it , but I am completely open to do that. In such case it could be worth to add a MoreLikeThis factory that manages the parameters and create the immutable MLT object ?
MoreLikeThisQuery It was not in the scope of this refactor but I am absolutely happy to tackle that immediately as a next step, it could give it a try to see if there is space for improvement there.
Solr Tests
I completely agree, indeed as part of my additional tests which I have ready for the sequent refactors I introducedmuch more tests Lucene side than Solr side. We can elaborate this further at the right moment
Lucene/Solr QA (migrated from JIRA)
✔ +1 overall |
---|
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
Prechecks | |||
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
master Compile Tests | |||
+1 | compile | 3m 1s | master passed |
Patch Compile Tests | |||
+1 | compile | 2m 52s | the patch passed |
+1 | javac | 2m 52s | the patch passed |
+1 | Release audit (RAT) | 1m 0s | the patch passed |
+1 | Check forbidden APIs | 0m 50s | the patch passed |
+1 | Validate source patterns | 0m 50s | the patch passed |
Other Tests | |||
+1 | unit | 0m 59s | classification in the patch passed. |
+1 | unit | 0m 36s | queries in the patch passed. |
+1 | unit | 71m 31s | core in the patch passed. |
83m 10s |
Subsystem | Report/Notes |
---|---|
JIRA Issue | LUCENE-8326 |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924770/LUCENE-8326.patch |
Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns |
uname | Linux lucene1-us-west 3.13.0-88-generic #135-Ubuntu SMP Wed Jun 8 21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | ant |
Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh |
git revision | master / 71ed5ba |
ant | version: Apache Ant(TM) version 1.9.3 compiled on April 8 2014 |
Default Java | 1.8.0_172 |
Test Results | https://builds.apache.org/job/PreCommit-LUCENE-Build/16/testReport/ |
modules | C: lucene/classification lucene/queries solr/core U: . |
Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/16/console |
Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
Dawid Weiss (@dweiss) (migrated from JIRA)
Ciao Alessandro. Just FYI: I've committed two (rather small) changes to the MLT class today that may require a rebase/ fixup to this patch.
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
Hi @dweiss, I am fixing the patch! In the meantime , what does this mean ? :
@link
#setMaxDocFreq(int)} internally (both conditions cannotCheers
Dawid Weiss (@dweiss) (migrated from JIRA)
This means the relative cutoff is essentially setting the absolute cutoff value; these are not two separate conditions.
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
https://github.com/apache/lucene-solr/pull/380 has been updated
New patch attached. Thanks @dweiss for letting us know ! If you are interested in this issue any feedback is more than welcome!
Lucene/Solr QA (migrated from JIRA)
❌ -1 overall |
---|
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
Prechecks | |||
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
master Compile Tests | |||
+1 | compile | 3m 15s | master passed |
Patch Compile Tests | |||
+1 | compile | 3m 46s | the patch passed |
+1 | javac | 3m 46s | the patch passed |
+1 | Release audit (RAT) | 1m 5s | the patch passed |
+1 | Check forbidden APIs | 0m 52s | the patch passed |
+1 | Validate source patterns | 0m 52s | the patch passed |
Other Tests | |||
+1 | unit | 0m 51s | classification in the patch passed. |
+1 | unit | 0m 36s | queries in the patch passed. |
-1 | unit | 76m 25s | core in the patch failed. |
89m 30s |
Reason | Tests |
---|---|
Failed junit tests | solr.handler.component.SearchHandlerTest |
solr.cloud.TestSolrCloudWithDelegationTokens |
Subsystem | Report/Notes |
---|---|
JIRA Issue | LUCENE-8326 |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12925161/LUCENE-8326.patch |
Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns |
uname | Linux lucene1-us-west 3.13.0-88-generic #135-Ubuntu SMP Wed Jun 8 21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | ant |
Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh |
git revision | master / fd929c1 |
ant | version: Apache Ant(TM) version 1.9.3 compiled on April 8 2014 |
Default Java | 1.8.0_172 |
unit | https://builds.apache.org/job/PreCommit-LUCENE-Build/17/artifact/out/patch-unit-solr_core.txt |
Test Results | https://builds.apache.org/job/PreCommit-LUCENE-Build/17/testReport/ |
modules | C: lucene/classification lucene/queries solr/core U: . |
Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/17/console |
Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
Failed - org.apache.solr.handler.component.SearchHandlerTest.testRequireZkConnectedDistrib It's not marked as BadApple but it doesn't seem related to this patch at all to me
Regression - org.apache.solr.cloud.TestSolrCloudWithDelegationTokens.testDelegationTokenRenew It's not marked as BadApple but it doesn't seem related to this patch at all to me
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
Any consideration on my answer from the 24/05 ? I would like to move this forward, happy to have a constructive discussion and revisiting the approach if necessary :)
Robert Muir (@rmuir) (migrated from JIRA)
I feel the same way as before: we shouldn't split up a class in a user-impacting way just because its 1000 lines of code. To the user it does not matter, they just see one class with getters and setters and its easy.
If it really needs to be split up, can we try to do it in a way that doesn't impact users, e.g. move some code into package-private implementation classes?
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
Ok Robert, I see your point.
I wouldn't say it is a critical part the parameters split up ( while I do believe the interesting terms retrieval and interesting terms scoring is, but this will be a later on discussion).
Let me spend some time looking for a more balanced and conveniente solution that makes a good compromise. I will update this Jira as soon as I have a new patch/ pull request. Thank you for your time again!
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
I just attached a revised patch and Pull Request :
1) impact on user side has been considerably reduced keeping getters and setters in the MLT main class and parameters in the external class 2) Parameters with defaults have their own class, making them easy to maintain and read, it will be easy to pass them to inner MLT modules when the refactor continues 3) Boost logic is still away from users of the MLT, this keeps the responsibility of managing boost, MLT side, open to discuss this 4) the patch is built on top of SOLR-12304 bugfix, that should go IN first anyway
Happy to revise further, if necessary and follow up with the following refactors ( in separate Jira issues)
Lucene/Solr QA (migrated from JIRA)
❌ -1 overall |
---|
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
Prechecks | |||
+1 | test4tests | 0m 0s | The patch appears to include 4 new or modified test files. |
master Compile Tests | |||
+1 | compile | 3m 12s | master passed |
Patch Compile Tests | |||
+1 | compile | 2m 49s | the patch passed |
+1 | javac | 2m 49s | the patch passed |
+1 | Release audit (RAT) | 0m 59s | the patch passed |
+1 | Check forbidden APIs | 0m 49s | the patch passed |
+1 | Validate source patterns | 0m 49s | the patch passed |
Other Tests | |||
+1 | unit | 0m 51s | classification in the patch passed. |
+1 | unit | 0m 39s | queries in the patch passed. |
-1 | unit | 78m 46s | core in the patch failed. |
90m 38s |
Reason | Tests |
---|---|
Failed junit tests | solr.cloud.autoscaling.sim.TestTriggerIntegration |
Subsystem | Report/Notes |
---|---|
JIRA Issue | LUCENE-8326 |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12926768/LUCENE-8326.patch |
Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns |
uname | Linux lucene1-us-west 3.13.0-88-generic #135-Ubuntu SMP Wed Jun 8 21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Build tool | ant |
Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh |
git revision | master / a4fa168 |
ant | version: Apache Ant(TM) version 1.9.3 compiled on April 8 2014 |
Default Java | 1.8.0_172 |
unit | https://builds.apache.org/job/PreCommit-LUCENE-Build/29/artifact/out/patch-unit-solr_core.txt |
Test Results | https://builds.apache.org/job/PreCommit-LUCENE-Build/29/testReport/ |
modules | C: lucene/classification lucene/queries solr/core U: . |
Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/29/console |
Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
Latest Jenkins failure doesn't seem to be related with the latest updates to the patch.
Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)
Anyone interested in helping me in refactoring and improving the MLT step by step? This is the first step, in the perspective of cleaning the code and make it more maintainable and extendable. Happy to help and to support the changes necessary.
Lucene/Solr QA (migrated from JIRA)
❌ -1 overall |
---|
Vote | Subsystem | Runtime | Comment |
---|---|---|---|
-1 | patch | 0m 6s | LUCENE-8326 does not apply to master. Rebase required? Wrong Branch? See https://wiki.apache.org/lucene-java/HowToContribute#Contributing_your_work for help. |
Subsystem | Report/Notes |
---|---|
JIRA Issue | LUCENE-8326 |
JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12926768/LUCENE-8326.patch |
Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/185/console |
Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
This message was automatically generated.
More Like This ca be refactored to improve the code readability, test coverage and maintenance. Scope of this Jira issue is to start the More Like This refactor from the More Like This Params. This Jira will not improve the current More Like This but just keep the same functionality with a refactored code.
Other Jira issues will follow improving the overall code readability, test coverage and maintenance.
Migrated from LUCENE-8326 by Alessandro Benedetti (@alessandrobenedetti), updated May 17 2019 Attachments: LUCENE-8326.patch (versions: 4) Linked issues:
8853
Pull requests: https://github.com/apache/lucene-solr/pull/369