Norconex / crawlers

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
https://opensource.norconex.com/crawlers
Apache License 2.0
183 stars 67 forks source link

Merging constant field to field with multiple instances #685

Closed svanschalkwyk closed 4 years ago

svanschalkwyk commented 4 years ago

When merging constant field to field with multiple instances as below, the output is

<meta name="p_similar_products">https://grocery.walmart.com</meta>
      <meta name="p_similar_products">/ip/Peeled-Baby-Cut-Carrots-2-lbs/10451316?athcpid=10451316&amp;athpgid=similaritems&amp;athcgid=null&amp;athznid=null&amp;athieid=null&amp;athstid=CS014&amp;athguid=090dc30f-848d-4305-9c08-37176357bce2&amp;athancid=null&amp;athena=true</meta>
      <meta name="p_similar_products">/ip/Marketside-Organic-Baby-Carrots-16-oz/51259199?athcpid=51259199&amp;athpgid=similaritems&amp;athcgid=null&amp;athznid=null&amp;athieid=null&amp;athstid=CS014&amp;athguid=090dc30f-848d-4305-9c08-37176357bce2&amp;athancid=null&amp;athena=true</meta> ...
   <tagger class="com.norconex.importer.handler.tagger.impl.ConstantTagger">
            <constant name="source">https://grocery.walmart.com</constant>
    </tagger>
    </preParseHandlers>

    <postParseHandlers>
        <tagger class="com.norconex.importer.handler.tagger.impl.MergeTagger">
          <merge toField="p_similar_products" deleteFromFields="false"
              singleValue="false" singleValueSeparator="">
            <fromFields>source,p_similar_products_temp</fromFields>
          </merge>
        </tagger>    
essiembre commented 4 years ago

What is your question/issue? On the surface, this looks good to me.

svanschalkwyk commented 4 years ago

The constant isn't being added to the second field. I can, however, merge two dynamic fields. I will visit this again later when I need it.