Norconex / committer-elasticsearch

Implementation of Norconex Committer for Elasticsearch.
https://opensource.norconex.com/committers/elasticsearch/
Apache License 2.0
11 stars 6 forks source link

Not able to commit the processed items to Elastic search using norconex file system collector #35

Open sanjeevarayuduuppara opened 5 years ago

sanjeevarayuduuppara commented 5 years ago

Hi I am using norconex filesystem collector to crawl files from shared path. I am trying the commit the processed items to Elastic search and File committer. It is not committing to Elastic search/Solr but getting saved into file system. PFB the config file. Please help me to resolve the issue.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xml>
<!-- 
   Copyright 2010-2017 Norconex Inc.

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->

<fscollector id="Text Files">

## Either uncomment or set the following variables or create yourself a 
## sample-config.variables (or properties) with the same variables set.

#set($path = "valid path")
#set($workdir = "E:\filesystem\norconex-collector-filesystem-2.8.0\norconex-collector-filesystem-2.8.0\examples")

#set($tagger = "com.norconex.importer.handler.tagger.impl")
#set($transformer = "com.norconex.importer.handler.transformer.impl")

  <logsDir>${workdir}/logs</logsDir>
  <progressDir>${workdir}/progress</progressDir>

  <crawlers>
    <crawler id="Sample Crawler">

      <workDir>${workdir}</workDir>

      <startPaths>
        <path>${path}</path>
      </startPaths>

      <numThreads>2</numThreads>

      <keepDownloads>false</keepDownloads>

      <importer>
        <postParseHandlers>
          <tagger class="${tagger}.ReplaceTagger">
            <replace fromField="samplefield" regex="true">
              <fromValue>ping</fromValue><toValue>pong</toValue>
            </replace>
            <replace fromField="Subject" regex="true">
                <fromValue>Sample to crawl</fromValue><toValue>Sample crawled</toValue>
            </replace>            
          </tagger>
        </postParseHandlers>
      </importer>
       <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
        <nodes>http://localhost:9200</nodes>
        <indexName>filetest</indexName>
        <typeName>filetest1</typeName>
      </committer>
         <committer class="com.norconex.committer.core.impl.JSONFileCommitter">
      <directory>${workdir}/jsoncrawledFiles</directory>
      <pretty>true</pretty>
      <!-- <docsPerFile>(max number of docs per JSON file)</docsPerFile> -->
      <!-- <compress>[false|true]</compress> -->
      <splitAddDelete>true</splitAddDelete>
      <fileNamePrefix>test</fileNamePrefix>
      <fileNameSuffix>json</fileNameSuffix>
  </committer>
      <committer class="com.norconex.committer.core.impl.FileSystemCommitter">
        <directory>${workdir}/crawledFiles</directory>
      </committer>

    </crawler>
  </crawlers>

</fscollector>
essiembre commented 5 years ago

You cannot have multiple committers defined like you are doing. One is simply ignored. Either use just one, or if you need multiple, you can wrap them both into a MultiCommitter.