Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 68 forks source link

question on Tagger #583

Closed dtcyad1 closed 4 years ago

dtcyad1 commented 5 years ago

Hi Pascal,

Is it possible to have the Copy Tagger used only if the toField is null or not present or just empty string. Case in point - i have a title that is null but i have the collector.referrer-link-text with a value. I want to copy this value into the title only if the title is not there or is null or empty string.

Can this be done?

Thanks

essiembre commented 5 years ago

It would be nice to be able to copy only if the target is empty. We can make this a feature request if you like.

In the meantime, you can make it work by combining the CopyTagger with overwrite being false, with the ForceSingleValueTagger. Something like this (not tested):

  <!-- This will copy the link text as an additional entry if one is already present: -->
  <tagger class="com.norconex.importer.handler.tagger.impl.CopyTagger">
      <copy fromField="collector.referrer-link-text" toField="title" overwrite="false" />
  </tagger>

  <!-- This will keep the first entry only: -->
  <tagger class="com.norconex.importer.handler.tagger.impl.ForceSingleValueTagger">
      <singleValue field="title" action="keepFirst"/>
  </tagger>
dtcyad1 commented 5 years ago

Hi Pascal,

It would be great to have this as a feature request. Currently, I have the force single tagged on the title set to keep last because I was getting a lot of files and zip files with multiple titles and the last one seemed to be the best, especially in the case of zip files - not sure why.

Thanks for all your help!!

-yogesh

On Apr 1, 2019, at 10:26 PM, Pascal Essiembre notifications@github.com wrote:

It would be nice to be able to copy only if the target is empty. We can make this a feature request if you like.

In the meantime, you can make it work by combining the CopyTagger with overwrite being false, with the ForceSingleValueTagger. Something like this (not tested):

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

essiembre commented 4 years ago

Implemented in upcoming version 3. A new "onSet" feature provides the following configurable options in most places where metadata fields can be added (and value already exists):