Norconex / importer

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
http://www.norconex.com/collectors/importer/
Apache License 2.0
32 stars 23 forks source link

RenameTagger doesn't seem to be working #70

Closed jacksonp2008 closed 6 years ago

jacksonp2008 commented 6 years ago

Trying to rename a field, but doesn't seem to work for me. The original "title" field is still in the data.

I tried both <prePars... as well as <postParse... Unclear on what the difference would between pre/post.

      <importer>
      <preParseHandlers>
  <tagger class="com.norconex.importer.handler.tagger.impl.RenameTagger">
      <rename fromField="title" toField="upd_title" overwrite="true" />
  </tagger>
        </preParseHandlers>
      </importer>

You can see "title" still in the data with kibana. screen shot 2017-11-17 at 14 53 52

essiembre commented 6 years ago

It is possible that a title is added again when parsing occurs. To confirm whether the field is really renamed properly, I recommend you place a DebugTagger before and after your RenameTagger and you will see what you have before and after. E.g. this one will log your two fields using the log level INFO:

<tagger class="com.norconex.importer.handler.tagger.impl.DebugTagger"
          logFields="title,upd_title" logLevel="INFO" >
jacksonp2008 commented 6 years ago

Debug is the key, working well now.

thankyou!