Closed liar666 closed 8 years ago
You indeed found a bug. Values were not copied over to the "toField" when they did not change from the replace action. A new importer snapshot release was just made with the fix. Copy the content of the lib folder over to your HTTP Collector install. Make sure you do not have duplicate Jars (keep greatest versions).
Please confirm.
Yes it works! Thanks again for your very quick actions!
FYI, when moving the libs, I discovered that the httpclient, httpcore and joda-time were more recent in my old norconex-collector than in the fresh norconex-importer :)
Hi,
For a given crawler, I extract/tag a field EXP_NAME+COUNTRY that contains both the name and the country of an author (in the format "firstname other-names lastname [CountryCode]").
Thanks to a ReplaceTagger with a regex, I expected to extract both information in separate fields: EXP_NAME and EXP_COUNTRY.
I've made an (xml) example crawler file here to demonstrate: test_norco.txt
Unfortunately, in the case where the country is not there (no "[]"), the crawler generates a field EXP_COUNTRY with an empty string, but no EXP_NAME field!
What seems strange to me is the the simple Java code attached below works, whereas it implements the same regexes: Test.txt
Am I mistaken somewhere (it's Friday I might have overlooked something :) ) or is there a bug in ReplaceTagger?