Allowed forced reading of files as UTF-8

rossjones commented 10 years ago

I get fed up of having to use codecs.open() and the fliddling about between ascii, 8859-1 and utf-8 just to read a text file into a unicode. 99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want is 'utf-8'.

Would be really nice if FFS handled all of this so I could open a file and know that whatever I read is utf-8/unicode sans faff.

davidmiller commented 10 years ago

Do you have sample files / outputs / pseudocode?

On 3 December 2013 11:23, Ross Jones notifications@github.com wrote:

I get fed up of having to use codecs.open() and the fliddling about between ascii, 8859-1 and utf-8 just to read a text file into a unicode. 99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want is 'utf-8'.

Would be really nice if FFS handled all of this so I could open a file and know that whatever I read is utf-8/unicode sans faff.

— Reply to this email directly or view it on GitHubhttps://github.com/davidmiller/ffs/issues/8 .

Love regards etc

David Miller http://www.deadpansincerity.com 07854 880 883

rossjones commented 10 years ago

So problem is:

If you use open().read() and you read a file that has an accented character in it (says a þ) then it comes out as \x634234 because read() only reads ascii. I then have to arse about decoding it or do

import codecs codecs.open(filename, ‘r’, ‘utf-8’)

And then .read() returns a unicode.

Also, this email is the test file.

R

On 5 Dec 2013, at 18:24, David Miller notifications@github.com wrote:

Do you have sample files / outputs / pseudocode?

On 3 December 2013 11:23, Ross Jones notifications@github.com wrote:

I get fed up of having to use codecs.open() and the fliddling about between ascii, 8859-1 and utf-8 just to read a text file into a unicode. 99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want is 'utf-8'.

Would be really nice if FFS handled all of this so I could open a file and know that whatever I read is utf-8/unicode sans faff.

— Reply to this email directly or view it on GitHubhttps://github.com/davidmiller/ffs/issues/8 .

Love regards etc

David Miller http://www.deadpansincerity.com 07854 880 883 — Reply to this email directly or view it on GitHub.

davidmiller / ffs

Allowed forced reading of files as UTF-8 #8