Norconex / collector-core

Collector-related code shared between different collector implementations
http://www.norconex.com/collectors/collector-core/
Apache License 2.0
7 stars 15 forks source link

Config loading: File encoding #18

Closed mariuspruski closed 6 years ago

mariuspruski commented 6 years ago

My issue: I would like to load the config XML files with UTF-8, but the files which are included via #parse are not read with UTF-8.

I've noticed that Norconex is not setting the input file encoding property of Apache Velocity. This means that Velocity uses the default encoding (ISO-8859-1).

It would be great if we could set this property somehow; or perhaps UTF-8 could be used by default. A fix would be for example to add

velocityEngine.setProperty(RuntimeConstants.INPUT_ENCODING, "UTF-8");

in ConfigurationLoader.java

essiembre commented 6 years ago

You can download a new snapshot release of Norconex Commons Lang with this change. Replace the existing norconex-commons-lang-x.x.x.jar with this version (or run the install script found in the zip) and please confirm.

mariuspruski commented 6 years ago

Hey, thank you for the quick response. The changes made in that version fix the issues with the file encoding. Just one more remark - the "ISO-8859-1" is not actually used as a key in the velocity properties, so I believe it isn't necessary to set it. The used keys could also be expressed using the available constants from org.apache.velocity.runtime.RuntimeConstants

        this.velocityEngine.setProperty("eventhandler.include.class", RelativeIncludeEventHandler.class.getName());
        this.velocityEngine.setProperty("resource.loader", "file");
        this.velocityEngine.setProperty("file.resource.loader.path", "");
        this.velocityEngine.setProperty("input.encoding", "UTF-8");
        this.velocityEngine.setProperty("output.encoding", "UTF-8");
        this.velocityEngine.setProperty("ISO-8859-1", "UTF-8");
        this.velocityEngine.setProperty("runtime.log.logsystem.class", "org.apache.velocity.runtime.log.Log4JLogChute");
        this.velocityEngine.setProperty("runtime.log", "");

Thanks for your work!

essiembre commented 6 years ago

RuntimeConstants are already being used. The ISO-8859-1 one was a constant named "DEFAULT_ENCODING". I thought it was to set the default encoding. :-) I removed it. Thanks for pointing it out.