Closed GoogleCodeExporter closed 9 years ago
I've fixed a few things here:
- project creation will no longer leave the encoding unset (the cause of your
NullPointerExceptions)
- I've lowered the minimum confidence threshold from 50 to 20 for the character
set guesser.
Ultimately I think we should be allowing the user more control over the
character encoding.
Note that the reinterpret() function won't actually do what you desire. You
want to use something along the lines of value.replace(' ',' ') [Where the
first literal contains a NBSP]
For anyone else who's attempting to reproduce this, if your system's default
character encoding is UTF-8, as mine is, you won't even get as far as Thad.
Instead you'll end up with all the non-breaking spaces substituted with the
replacement character (because the ISO Latin-1 NBSP character is invalid
UTF-8). No amount of reencoding will save you at that point.
Original comment by tfmorris
on 26 Nov 2010 at 10:20
Fixed in rev 1931.
Original comment by tfmorris
on 26 Nov 2010 at 10:25
Issue 164 has been merged into this issue.
Original comment by tfmorris
on 27 Nov 2010 at 12:38
Issue 386 has been merged into this issue.
Original comment by tfmorris
on 25 May 2011 at 5:23
Original comment by tfmorris
on 9 Jun 2011 at 7:58
OK, those supported encodings SHOULD work, however they do not with
reinterpret() function. We have that issue logged, but I would really like to
see that fixed. NOW. Before we release 2.5
A simple test such as:
"Can we fix this?!?".toString().escape("html") WORKS :)
"Can we fix this?!?".toString().reinterpret("utf-8") Error: reinterpret:
encoding 'utf-8' is not available or recognized. FAILS :(
reinterpret("Can we fix this?!?","utf-8") Error: reinterpret: encoding 'utf-8'
is not available or recognized. FAILS :(
$10 Paypal bucks for the person who fixes this first, from me !
Original comment by thadguidry
on 21 Oct 2011 at 10:09
All three examples work without error on my Ubuntu system when testing against
SVN trunk.
What O/S are you using? Does the problem affect only utf-8 or all encodings?
Have you tried any variations such as "UTF-8" or "UTF8" ?
I'll boot Windows to check it there after I've finished up some other stuff.
Original comment by tfmorris
on 21 Oct 2011 at 10:53
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Windows\system32>java -version
java version "1.7.0"
Java(TM) SE Runtime Environment (build 1.7.0-b147)
Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
Yes, tried "ascii" "us-ascii" "US-ASCII" "UTF8" utf8" "UTF-8" "utf-8" "latin-1" "LATIN-1" "BIG5" "big5"
Original comment by thadguidry
on 21 Oct 2011 at 11:11
Windows 7 64 bit
Original comment by thadguidry
on 21 Oct 2011 at 11:14
Both cases labelled as failing also work on Windows XP with a Sun Java 1.6.0
JVM.
There are only a couple of things which come to mind as possibilities:
1. It's a Java 7 specific bug, although that seems like a pretty big thing for
them to have broken and not noticed.
2. The encoding stored in the project is messed up (perhaps an old project from
back when the encoding could be null due to a bug). There are two character
encodings involved in this operation, the source encoding and the destination
encoding, so it might not be the "utf-8" which is the problem.
I suggest that we move the discussion someplace other than this bug report (the
dev list?) since I'm pretty convinced it's not a regression of this bug fix.
Original comment by tfmorris
on 26 Oct 2011 at 11:43
Agreed, push this up to the dev list so we can talk and test the crap out of
this. It is really bugging me. I do have my JAVA_HOME path set to 1.6.24
version, btw.
Original comment by thadguidry
on 26 Oct 2011 at 11:48
I'm fairly convinced that the underlying problem in comment 6 is that the
project's encoding isn't set properly. I've created a new issue 486 to track
this.
Original comment by tfmorris
on 18 Nov 2011 at 11:35
Original issue reported on code.google.com by
thadguidry
on 18 Nov 2010 at 4:41Attachments: