MarkCommercials fails with non-ASCII chars in filename or title

essandess / etv-comskip

Commercial Marking and Skipping for EyeTV and iTunes Exports

GNU General Public License v2.0

55 stars 7 forks source link

MarkCommercials fails with non-ASCII chars in filename or title #8

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago

> What steps will reproduce the problem?
1. Have a recording with non-ASCII chars (such as German umlauts like
ä,ö,ü) in title or file name. E.g. the genre 'comedy' is 'Komödie' in
German -- that's how this can end up in the file name even if it's not in
the title.
2. Try to run MarkCommercials

> What is the expected output? What do you see instead?
Fails to mark commercials, sometimes with a cryptic dialog that does not
give a line number.

> What version of the product are you using? On what operating system?
ETVComskip-1.0.3-10.5, Mac OS 10.5.3, EyeTV 3.0.2

> Please provide any additional information below.
I've never done anything in python, but this hack works for me:
In "def WriteToLog(message):", replace
        log.write('%s - %s' % (time.asctime(), message))
with
        log.write('%s - %s' % (time.asctime(),
message.encode("ascii","replace")))

Original issue reported on code.google.com by markus.k...@gmail.com on 12 Jun 2008 at 12:23

GoogleCodeExporter commented 8 years ago

I have noted the same problem few times, but not able to reproduce it when 
trying to
execute the MarkCommercials from command line. The file name in my case was 
"Skins -
liekeissä.eyetv" where the 'ä' char causes the problem probably as the error 
message is: 

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4'

But as I said have not been able to reproduce this when trying to run 
MarkCommercials
again from the command line. Happened when commercials were marked automatically
after recording ends. 

I'll try the fix/hack described above by Markus, hopefully it will help.

Original comment by mnikk...@gmail.com on 8 Jul 2008 at 8:30

GoogleCodeExporter commented 8 years ago

Same issue here - etv-comskip fails 50% of the time for me (lots of umlauts in 
the recording titles).
I have no knowledge about python at all, but is it a good solution to pin 
everything down to ASCII? 

My ugly (controversial in a bunch of threads) workaround is to use
an sitecustomize.py script with the content:

import sys
sys.setdefaultencoding('utf_8')

- I'd definitely opt for UTF-8 log messages + files.

Original comment by t.engelm...@gmail.com on 28 Jul 2008 at 11:20

GoogleCodeExporter commented 8 years ago

As I said, I don't know much about python, and t.engelmeier's solution might be 
better.

Nevertheless, this logging problem makes etv-comskip fairly useless for 
international
users. As a temporary workaround, you can replace /Library/Application
Support/ETVComskip/MarkCommercials.app/Contents/Resources/MarkCommercials.py 
with the
attached file. I've been using this workaround for 2 months now, and all my
recordings have been processed properly.

Original comment by markus.k...@gmail.com on 4 Aug 2008 at 5:53

Attachments:

MarkCommercials.py

GoogleCodeExporter commented 8 years ago

Its best to keep text in Unicode as much as possible and only convert when 
logging or displaying on the 
terminal. On OS X, the normal encoding is UTF-8 so change WriteToLog

    if options.log:
        if type(message) == type(u""):
            message = message.encode('utf-8')
        log.write('%s - %s' % (time.asctime(), message))
        log.flush()

   Replace all instances of .encode("ascii","replace") with .encode('utf-8')

   The code that lists all the recordings can be simplified down to
    if len(args) == 0:
        for rec in GetRecordings():
            programName = os.path.split(os.path.splitext(os.path.dirname(rec.location.get().path))[0])[1]
            msg = '  %d = [%s], [%s], [%s]' % (rec.unique_ID.get(), programName, rec.channel_number(), 
rec.station_name())
            WriteToLog('%s\n' % msg)
            print msg.encode('utf-8')

Original comment by nyamaton...@gmail.com on 7 Mar 2009 at 12:58

GoogleCodeExporter commented 8 years ago

Issue 29 has been merged into this issue.

Original comment by jon.chri...@gmail.com on 19 Apr 2010 at 8:43

GoogleCodeExporter commented 8 years ago

changes applied. thanks.

Original comment by jon.chri...@gmail.com on 19 Apr 2010 at 8:54

Changed state: Fixed