Open rossjones opened 10 years ago
Do you have sample files / outputs / pseudocode?
On 3 December 2013 11:23, Ross Jones notifications@github.com wrote:
I get fed up of having to use codecs.open() and the fliddling about between ascii, 8859-1 and utf-8 just to read a text file into a unicode. 99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want is 'utf-8'.
Would be really nice if FFS handled all of this so I could open a file and know that whatever I read is utf-8/unicode sans faff.
— Reply to this email directly or view it on GitHubhttps://github.com/davidmiller/ffs/issues/8 .
Love regards etc
David Miller http://www.deadpansincerity.com 07854 880 883
So problem is:
If you use open().read() and you read a file that has an accented character in it (says a þ) then it comes out as \x634234 because read() only reads ascii. I then have to arse about decoding it or do
import codecs codecs.open(filename, ‘r’, ‘utf-8’)
And then .read() returns a unicode.
Also, this email is the test file.
R
On 5 Dec 2013, at 18:24, David Miller notifications@github.com wrote:
Do you have sample files / outputs / pseudocode?
On 3 December 2013 11:23, Ross Jones notifications@github.com wrote:
I get fed up of having to use codecs.open() and the fliddling about between ascii, 8859-1 and utf-8 just to read a text file into a unicode. 99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want is 'utf-8'.
Would be really nice if FFS handled all of this so I could open a file and know that whatever I read is utf-8/unicode sans faff.
— Reply to this email directly or view it on GitHubhttps://github.com/davidmiller/ffs/issues/8 .
Love regards etc
David Miller http://www.deadpansincerity.com 07854 880 883 — Reply to this email directly or view it on GitHub.
I get fed up of having to use codecs.open() and the fliddling about between ascii, 8859-1 and utf-8 just to read a text file into a unicode. 99% of what I get is one of ['utf-8', 'latin1', 'ascii']. All I ever want is 'utf-8'.
Would be really nice if FFS handled all of this so I could open a file and know that whatever I read is utf-8/unicode sans faff.