Closed doggy-dev closed 6 years ago
Thanks for the PR. First of all, for making a PR, you would need to sign the Eclipse Contributor Agreenment [1] (ECA). I know, even for such a little change.
What I don't understand is the actual problem. Now I agree that UTF-8 is the best choice. However I would expect the createXMLStreamWriter
method to create a new XML writer which will encode the XML properly in all character set encodings, escaping characters when necessary. Do you have any example of an error. That would be helpful.
I suppose javax.xml.stream.XMLOutputFactory.createXMLStreamWriter(OutputStream)
creates a writer with default encoding which is not UTF-8 when running on Windows. In my case I have a 'ü' in my feature.xml. The byte representing ü in windows encoding CP1252 is not a valid first byte in UTF-8
I singned the ECA. Can you retrigger the checks?
So even a ü
can be encoded in CP1252. So I don't see a problem there:
<?xml version="1.0" encoding="CP1252"?>
<!DOCTYPE xml>
<foobar bar="äöü">
</foobar>
You can copy this over e.g. Eclipse as an editor. Save the file an verify it is CP1252. The Umlaute are encoded properly. So that should not be a problem.
Normally forcing a new push is enough to re-trigger the check. However the e-mail you used (veselin_m@yahoo.com
) still cannot be validated using the validation tool. Maybe you used a different address?
Sure an ü can be encoded in CP1252. I'll try to explain one more time. My feature.xml has an ü in the description. When I upload the jar PD (concrete Creator) creates a local temp file encoded in CP1252 because PD is running on Windows. Then the ü gets written into the temp file CP1252 encoded. A bit later PD (concrete ExtractorImpl) reads the temp file and thinks it is UTF-8 encoded and when it reads a byte 252 (0xFC) an Exception is thrown.
I'm attaching the files so you can take a look at them. Sorry but can't attach the jar of the feature - github won't let me. stacktrace.txt upload-715604784386970734.txt
Ok, I think I can see the problem. The file doesn't contain any character set information. So I guess you are right, the simplest solution would be to align everything with UTF-8.
Because ExtractorImpl in line 55 reads XML file in UTF-8 format Creator has to write them in UTF-8. This is important for Windows installations when frature files contain non standard characters in descriptions
Signed-off-by: Veselin Markov veselin_m@yahoo.com