Closed niels closed 7 years ago
Can you try with the new 2.0.2 release which was recently updated to support Elasticsearch 1.7.4.
If you are interested to use Elasticsearch 2.1, there is now a new 2.1.0 snapshot release as well supporting it.
@niels, is this ticket still relevant or can we close?
I'd like to keep it open because I want to reproduce it.
I was able to reproduce this issue. Using the latest snapshot of norconex-collector-http-2.4.0, I used the Elasticsearch committer 2.0.2 against Elasticsearch server version 1.7.4 and I got the same issue. But using the latest 2.1.0 snapshot of the committer against Elasticsearch 2.1.1 did not yield the issue. Since we did not specifically address this issue in the version 2.1.0 of the committer, my guest would be that this is an issue in one of the dependency. Maybe that dependency was upgraded to a later version in the latest committer which fixed the issue. I will need to do some tests to confirm (or not) that theory.
I did some further tests. I never noticed before that the ES Node client had a close() method. If I call it after committing documents, then the crawler properly terminates with the ES committer 2.0.2. So it seems that when using ES 1.7 within a client application, we must call this close method for the client to properly terminates. And it looks like that ES 2.1 automatically detects that the client is going away and terminates properly without having to explicitly call close (tough it might not be good behavior, because latest documentation still shows to call close when we are done with the client).
So it would be best to add a call to this close method when we are done with the client for both ES committer version 2.0.x (using ES 1.7) and 2.1.x (using ES 2.1). Problem is how do we know when it's time to call this close method. I have not seen a close method or similar on the ICommitter or AbstractCommitter classes. @essiembre any way for a committer to know when crawling is done so it can do some proper cleanup?
Normally, the commit()
is only called at the very end by crawlers. The challenge is most committers extend the AbstractBatchCommitter
and call commit once in a while internally during the process. So we would have to change the design a bit so that committer implementations do not invoke commit()
directly from internally, but rather a new method such as internalCommit()
and then we'll know commit()
will only be called at the end. Not sure how feasible it is with the Elasticsearch committer right now without going back and change committer-core (which I think is what will need to happen).
The only "hack" I can imagine in the meantime is copying the content of most of the abstract classes the elasticsearch committer uses and fix this just for the elasticsearch committer.
I will create a new ticket under committer-core to address this.
4.0.0 released.
When committing to elasticsearch (see the below config), the
collector-http.sh
script never terminates even though the crawler run has already ended. I have to manually kill the process usingCTRL+C
orkill
.This is using the norconex-collector-http-2.4.0-20151209.033143-7 snapshot and norconex-committer-elasticsearch-2.0.1 against elasticsearch 1.7.3. I understand that there might be an incompatibility between the committer and ES 1.7 but the documents are committed to ES just fine. Please close this issue if relates to #2 after all.
Normal output:
The much longer debug output can be found here. Note that everything following
INFO [JobSuite] Running test-crawler: END (Wed Dec 09 18:24:12 CET 2015)
is only printed to the console but not appended to the log file.