bodograumann / python-iconv

Python 3 wrapper for iconv and usage as codecs
GNU General Public License v3.0
7 stars 2 forks source link

Convert files #1

Closed PanderMusubi closed 3 years ago

PanderMusubi commented 3 years ago

Could you give example code to convert complete files or offer a function for that?

bodograumann commented 3 years ago

Sure. With the implemented codecs it should be as simple as:

import iconvcodec

with open("input.txt", mode="r", encoding="utf-8") as infile, \
     open("output.txt", mode="w", encoding="ASCII/TRANSLIT") as outfile:
    for line in infile:
        print(line, file=outfile, end="")

Some care would have to be taken if the file is not line-based, but given that this is a text file, that should rarely happen.

PanderMusubi commented 3 years ago

Thanks. Related question, do you also happen to know a Python implementation for /usr/bin/file?

bodograumann commented 3 years ago

Your are in luck :-D. I have compiled the available options on stackoverflow. I would suggest to use the native integration of libmagic (i.e. the file utility), then you don’t need to install any additional python packages.

PanderMusubi commented 3 years ago

Thanks, that page is very useful. Last related question, if you don't mind. 😅

I need flip -ub in Python and have

def dos2unix(infilename, outfilename):
    with open(infilename, 'br') as infile, \
         open(outfilename, 'bw') as outfile:
        for line in infile:
            outfile.write(line.replace(b'\r\n', b'\n'))

but it is inserting \n too many for each line. Any pointers there? I see a lot of examples that load the entire file in memory and use readlines, but I would like a line by line aproach.

bodograumann commented 3 years ago

It seems to work for me :thinking:

PanderMusubi commented 3 years ago

The issue is on a file ISO-8859 text, with CRLF, LF line terminators (e.g. /usr/share/hunspell/ru_RU.dic from hunspell-ru)

PanderMusubi commented 3 years ago

Ah, found it, the code in your first reply should be without a print but with a write:

with open(infilename, mode='r', encoding=inencoding) as infile, \
     open(outfilename, mode='w', encoding=outencoding) as outfile:
    for line in infile:
        outfile.write(line)