Closed GoogleCodeExporter closed 9 years ago
Please compare
wget -q -O- --header\="Accept-Encoding: gzip"
http://dailyfratze.de/owr/dailyfratze-base-de.js | gunzip > out.html
and
wget http://dailyfratze.de/owr/dailyfratze-base-de.js
Original comment by m...@planet-punk.de
on 9 Jul 2012 at 7:20
Original comment by alex.obj...@gmail.com
on 9 Jul 2012 at 7:22
{{{
public static void main(String...a) throws FileNotFoundException, IOException {
final BufferedInputStream in = new BufferedInputStream(new FileInputStream(new File("/Users/msimons/tmp/dailyfratze-base-de.js.2")));
byte[] b = new byte[1024];
int len = 0;
StringBuilder sb = new StringBuilder();
while((len=in.read(b, 0, 1024))>0) {
sb.append(new String(b, 0, len, Charset.forName("UTF-8")));
}
in.close();
System.out.println("File length: " + new File("/Users/msimons/tmp/dailyfratze-base-de.js.2").length());
System.out.println("String length (buffered inputstream) " + sb.toString().length());
String s = IOUtils.toString(new FileInputStream(new File("/Users/msimons/tmp/dailyfratze-base-de.js.2")));
System.out.println("String length (ioutils)" + s.length());
}
}}}
Output is:
File length: 491373
String length (buffered inputstream) 491371
String length (ioutils)491371
dafuq?
Original comment by m...@planet-punk.de
on 9 Jul 2012 at 7:48
Attachments:
Could it be BOM character the reason?
Original comment by alex.obj...@gmail.com
on 9 Jul 2012 at 7:50
Maybe i look with the wrong regex but this file has no bom at the start…
Maybe one of the scripts it consist of?
Original comment by m...@planet-punk.de
on 9 Jul 2012 at 7:59
I don't see the BOM either. Not sure what is the reason. Will do some research
to find the possible cause... If you have any suggestions, let me know.
Does this issue breaks the page rendering?
Btw, do you have messenger ID?
Original comment by alex.obj...@gmail.com
on 9 Jul 2012 at 8:01
grep -rl $'\xEF\xBB\xBF' dailyfratze-base-de.js.2 doesn't show any…
Original comment by m...@planet-punk.de
on 9 Jul 2012 at 8:01
Yes it does… In the end 2 chars are missing… I'm downgrading to 1.4.4 right
now…
Original comment by m...@planet-punk.de
on 9 Jul 2012 at 8:02
you can extend the filter and set the content-length header yourself as a
temporary workaround.
Have you noticed if the problem is consistent?
Original comment by alex.obj...@gmail.com
on 9 Jul 2012 at 8:05
Yes, the problem is consistent and i think it relates to the simple example
above as the ResourceBundleProcessor also uses the raw string length in line 109
{{{
response.setContentLength(cacheValue.getRawContent().length());
IOUtils.write(cacheValue.getRawContent(), os, configuration.getEncoding());
}}}
I don't know the content length in advance… I'd also go with the string
length and this seems to be wrong…
Original comment by m...@planet-punk.de
on 9 Jul 2012 at 8:12
I'm still wondering why the file.legth is different than String.length.
Original comment by alex.obj...@gmail.com
on 9 Jul 2012 at 8:26
Evrika! :)
The "String".length() != "String".getBytes().length.
Will fix it soon.
Original comment by alex.obj...@gmail.com
on 9 Jul 2012 at 8:30
This line (line 45 in the original file) causes the problems
Column 49 looks like a blank but isn't…
This must be from JQuery, the original code is this:
if ( rnotwhite.test( "\xA0" ) ) {
trimLeft = /^[\s\xA0]+/;
trimRight = /[\s\xA0]+$/;
}
JQuery 1.7.1 line 897
I mentioned the processors used above…
Original comment by m...@planet-punk.de
on 9 Jul 2012 at 8:32
Attachments:
Thought about something like this… One unicode character can be more than one
byte… and length returns the chars, doesn't it?
Original comment by m...@planet-punk.de
on 9 Jul 2012 at 8:34
Probably. I fixed the issue in branch 1.4.x. Could you build the wro4j-core and
confirm that the problem is fixed?
Original comment by alex.obj...@gmail.com
on 9 Jul 2012 at 8:37
Original comment by alex.obj...@gmail.com
on 9 Jul 2012 at 8:37
And the explanation is:
There are UTF-8 characters which are stored on 2bytes (example: ä)
"ä".length() == 1
"ä".getBytes().length == 2
Original comment by alex.obj...@gmail.com
on 9 Jul 2012 at 8:51
Original comment by alex.obj...@gmail.com
on 9 Jul 2012 at 8:52
Error is fixed. Thanks!
Original comment by m...@planet-punk.de
on 9 Jul 2012 at 9:46
Your fix in commit 471a424a78 didn't fix this issue for me.
I had to change it to:
response.setContentLength(cacheValue.getRawContent().getBytes(configuration.getE
ncoding()).length);
to get the correct content length
Original comment by Juh...@gmail.com
on 12 Jul 2012 at 10:33
Yep, the fix is probably better… I'm using UTF-8 and so is the default of my
vm but this is not always the case.
Original comment by m...@planet-punk.de
on 12 Jul 2012 at 10:44
Thanks for noticing. I'll update it.
Original comment by alex.obj...@gmail.com
on 12 Jul 2012 at 10:57
The fix was updated in 1.4.x.
Original comment by alex.obj...@gmail.com
on 12 Jul 2012 at 11:07
Original issue reported on code.google.com by
m...@planet-punk.de
on 9 Jul 2012 at 7:04