Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
The current way of using ScriptTagger is like this:
<tagger class="com.norconex.importer.handler.tagger.impl.ScriptTagger">
<script><![CDATA[
... my script ...
]]></script>
</tagger>
Some of my scripts are longer than a few lines, so I thought it would be nice to have them in separate files. I tried adding a #parse("myscript.js") in the CDATA block, but it doesn't seem to work. I get these errors when running HTTP Collector:
ERROR [XMLConfigurationUtil$LogErrorHandler] (XML Validation) ScriptTagger: cvc-minLength-valid: Value '' with length = '0' is not facet-valid with respect to minLength '1' for type '#AnonType_scripttagger'.
ERROR [XMLConfigurationUtil$LogErrorHandler] (XML Validation) ScriptTagger: cvc-type.3.1.3: The value '' of element 'script' is not valid.
It would be nice if ScriptTagger supported a parameter for loading the script from an external file. Something like this maybe:
The current way of using
ScriptTagger
is like this:Some of my scripts are longer than a few lines, so I thought it would be nice to have them in separate files. I tried adding a
#parse("myscript.js")
in the CDATA block, but it doesn't seem to work. I get these errors when running HTTP Collector:It would be nice if ScriptTagger supported a parameter for loading the script from an external file. Something like this maybe:
Alternative solutions also welcome :)