Norconex / committer-elasticsearch

Implementation of Norconex Committer for Elasticsearch.
https://opensource.norconex.com/committers/elasticsearch/
Apache License 2.0
11 stars 6 forks source link

Committer closed without sending any documents #40

Open benzaita opened 4 years ago

benzaita commented 4 years ago

I have configured an Elasticsearch domain in AWS and verified it works by PUTting a document into it using curl.

However, when running the http-collector configured with the elasticsearch-committer the committer just closes without sending any documents or reporting any errors:

INFO  [AbstractCrawler] MyWebsite: Crawler finishing: committing documents.
INFO  [ElasticsearchCommitter] Elasticsearch RestClient closed.
INFO  [AbstractCrawler] MyWebsite: 4 reference(s) processed.
INFO  [CrawlerEventManager]          CRAWLER_FINISHED
INFO  [AbstractCrawler] MyWebsite: Crawler completed.
INFO  [AbstractCrawler] MyWebsite: Crawler executed in 12 seconds.
INFO  [SitemapStore] MyWebsite: Closing sitemap store...
INFO  [JobSuite] Running MyWebsite: END (Mon Feb 10 08:58:07 UTC 2020)

This line (INFO [ElasticsearchCommitter] Elasticsearch RestClient closed.) is the only output I get from the committer which is configured as follows:

        <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
            <nodes>https://hostname-in-aws</nodes>
            <indexName>mywebsite</indexName>
            <queueSize>1</queueSize>
            <commitBatchSize>1</commitBatchSize>
            <ignoreResponseErrors>false</ignoreResponseErrors>
        </committer>

How can I increase the log level? Or - what could be the problems here?

essiembre commented 4 years ago

Look into the collector directory for a file called log4j.properties. You can use it to raise the log level.

You are showing the last part of your log only. I am curious to see what shows up before. There were only 4 documents processed. The logs should tell you if they were rejected or what not. To make it to Elasticsearch you should see log entries with DOCUMENT_COMMITTED_ADD in them. Do you see any?

If you only see REJECTED_... and you cannot figure it out, you can change the log level for those rejections to get more details explaining why it was rejected.

If you cannot figure it out, please attach your config.