Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
For complicated reasons, I need a MergeTagger, that would take arguments: fromField1, fromField2, toField, separator and merge/concatenate value-by-value the contents of sourceField1+sourceField2 into destinationField with optional separation with separator).
I'm currently trying to write my own quick&dirty version, but norconex's importer you would probably benefit from directly including a clean a generic version!
In case number of values in the fromFields differs, it could raise an exception or accept a default value?
Hi,
For complicated reasons, I need a MergeTagger, that would take arguments: fromField1, fromField2, toField, separator and merge/concatenate value-by-value the contents of sourceField1+sourceField2 into destinationField with optional separation with separator).
Example: EXP_FIRST_NAME=Fabien^|~Albert EXP_LAST_NAME=Coco^|~Rico -- separator=" "--> EXP_NAME=Fabien Coco^|~Albert Rico
I'm currently trying to write my own quick&dirty version, but norconex's importer you would probably benefit from directly including a clean a generic version!
In case number of values in the fromFields differs, it could raise an exception or accept a default value?
The prototype of the configuration would be: