ClosestStorm / v8cgi

Automatically exported from code.google.com/p/v8cgi
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Char with code 65279 (BOM) in file read #53

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Is it correct for "new File(path).open('r').read()" get char with code 65279 
(BOM) at first position?

Original issue reported on code.google.com by mr.ve...@gmail.com on 10 Jan 2010 at 9:59

GoogleCodeExporter commented 9 years ago
Yes, it is. This character is present in file so it will be returned. Doing 
read()
will not ommit any bytes from the file.

Original comment by ondrej.zara on 11 Jan 2010 at 7:52

GoogleCodeExporter commented 9 years ago
but if i specify utf8 as characters encoding? bytes must converts to "symbols" 
(without BOM), isn't it?

Original comment by mr.ve...@gmail.com on 11 Jan 2010 at 8:07

GoogleCodeExporter commented 9 years ago
BOM is a valid unicode character.. it is just used as a signal to text editor 
(or any
other software) that the file itself is encoded in UTF-8 (one variant of 
Unicode data
serialization).
The absence/presence of BOM in a file is used at the application level. IO 
routines
just provide the (unchanged) content of files.

Original comment by ondrej.zara on 11 Jan 2010 at 1:03

GoogleCodeExporter commented 9 years ago
ok, I've got it

Original comment by mr.ve...@gmail.com on 11 Jan 2010 at 9:39