eclipse-archived / packagedrone

Eclipse Package Drone
http://eclipse.org/package-drone
Eclipse Public License 1.0
66 stars 39 forks source link

Creator writes files in UTF-8 #120

Closed doggy-dev closed 6 years ago

doggy-dev commented 6 years ago

Because ExtractorImpl in line 55 reads XML file in UTF-8 format Creator has to write them in UTF-8. This is important for Windows installations when frature files contain non standard characters in descriptions

Signed-off-by: Veselin Markov veselin_m@yahoo.com

ctron commented 6 years ago

Thanks for the PR. First of all, for making a PR, you would need to sign the Eclipse Contributor Agreenment [1] (ECA). I know, even for such a little change.

What I don't understand is the actual problem. Now I agree that UTF-8 is the best choice. However I would expect the createXMLStreamWriter method to create a new XML writer which will encode the XML properly in all character set encodings, escaping characters when necessary. Do you have any example of an error. That would be helpful.

[1] https://www.eclipse.org/legal/ECA.php

doggy-dev commented 6 years ago

I suppose javax.xml.stream.XMLOutputFactory.createXMLStreamWriter(OutputStream) creates a writer with default encoding which is not UTF-8 when running on Windows. In my case I have a 'ü' in my feature.xml. The byte representing ü in windows encoding CP1252 is not a valid first byte in UTF-8

doggy-dev commented 6 years ago

I singned the ECA. Can you retrigger the checks?

ctron commented 6 years ago

So even a ü can be encoded in CP1252. So I don't see a problem there:

<?xml version="1.0" encoding="CP1252"?>
<!DOCTYPE xml>
<foobar bar="äöü">
</foobar>

You can copy this over e.g. Eclipse as an editor. Save the file an verify it is CP1252. The Umlaute are encoded properly. So that should not be a problem.

Normally forcing a new push is enough to re-trigger the check. However the e-mail you used (veselin_m@yahoo.com) still cannot be validated using the validation tool. Maybe you used a different address?

doggy-dev commented 6 years ago

Sure an ü can be encoded in CP1252. I'll try to explain one more time. My feature.xml has an ü in the description. When I upload the jar PD (concrete Creator) creates a local temp file encoded in CP1252 because PD is running on Windows. Then the ü gets written into the temp file CP1252 encoded. A bit later PD (concrete ExtractorImpl) reads the temp file and thinks it is UTF-8 encoded and when it reads a byte 252 (0xFC) an Exception is thrown.

I'm attaching the files so you can take a look at them. Sorry but can't attach the jar of the feature - github won't let me. stacktrace.txt upload-715604784386970734.txt

ctron commented 6 years ago

Ok, I think I can see the problem. The file doesn't contain any character set information. So I guess you are right, the simplest solution would be to align everything with UTF-8.