Closed adrianholovaty closed 7 years ago
Hey, sorry, I didn't see this until now.
The reason you're getting an SRTParseError is because this file contains a UTF-8 BOM, which is unsupported (srt
tools open files with encoding utf-8
, not utf-8-sig
).
I've been considering adding some way for people to flag other encodings with the command line tools, let me look into it.
(Specifically, it is because argparse.FileType()
does not accept an encoding parameter)
I think the most reasonable way to solve this is to set your encoding correctly in your environment, to be honest.
Here is a half-baked attempt at solving the issue with an -e
parameter, but doing this for output is a bit more ugly (requiring some copying of code for now):
commit 32063fa63de57a9a7a452a907161ac3636a03067
Author: Chris Down <chris@chrisdown.name>
Date: Tue Sep 27 00:22:36 2016 +0100
Add -e/--encoding
diff --git srt_tools/srt-fix-subtitle-indexing srt_tools/srt-fix-subtitle-indexing
index 2c44419..a01d6c0 100755
--- srt_tools/srt-fix-subtitle-indexing
+++ srt_tools/srt-fix-subtitle-indexing
@@ -6,8 +6,8 @@ import srt
def main():
args = srt_tools.utils.basic_parser().parse_args()
- subtitles_in = srt.parse(args.input.read())
- output = srt.compose(subtitles_in, strict=args.strict)
+ srt_tools.utils.set_basic_args(args)
+ output = srt.compose(args.input, strict=args.strict)
args.output.write(output)
diff --git srt_tools/utils.py srt_tools/utils.py
index f08a5d6..13602a4 100755
--- srt_tools/utils.py
+++ srt_tools/utils.py
@@ -1,32 +1,39 @@
#!/usr/bin/env python
import argparse
+import srt
import sys
import logging
import itertools
+import collections
+
+
+log = logging.getLogger(__name__)
def basic_parser(multi_input=False):
parser = argparse.ArgumentParser(description=__doc__)
+ # Cannot use argparse.FileType as we need to know the encoding from the
+ # args
+
if multi_input:
parser.add_argument(
'--input', '-i', metavar='FILE',
- action='append', type=argparse.FileType('r'),
+ action='append',
help='the files to process',
required=True,
)
else:
parser.add_argument(
'--input', '-i', metavar='FILE',
- default=sys.stdin, type=argparse.FileType('r'),
+ default=sys.stdin,
help='the file to process (default: stdin)',
)
parser.add_argument(
'--output', '-o', metavar='FILE',
default=sys.stdout,
- type=argparse.FileType('w'),
help='the file to write to (default: stdout)',
)
parser.add_argument(
@@ -40,9 +47,26 @@ def basic_parser(multi_input=False):
const=logging.DEBUG, default=logging.WARNING,
help='enable debug logging',
)
+ parser.add_argument(
+ '--encoding', '-e',
+ help='the encoding to read/write files in (default: '
+ 'from your environment)',
+ )
return parser
+def set_basic_args(args):
+ if isinstance(args.input, collections.MutableSequence):
+ for i, input_filename in enumerate(args.input):
+ with open(input_filename, encoding=args.encoding) as input_f:
+ args.input[i] = srt.parse(input_f.read())
+ elif args.input is sys.stdin:
+ log.warning('-e/--encoding has no effect on %s', stream)
+ else:
+ with open(args.input, encoding=args.encoding) as input_f:
+ args.input = srt.parse(input_f.read())
+
+
def sliding_window(seq, width=2):
seq_iter = iter(seq)
sliced = tuple(itertools.islice(seq_iter, width))
Added in dea16c2. Confirmed it works with your test input:
srt encoding % PYTHONPATH=. srt_tools/srt-fixed-timeshift --seconds 5 -i ~/Downloads/srttest.txt -o /tmp/srttest-out.srt -e utf-8-sig
srt encoding % diff -u ~/Downloads/srttest.txt /tmp/srttest-out.srt | head -20
--- /home/cdown/Downloads/srttest.txt 2016-12-25 09:44:45.677276478 +0000
+++ /tmp/srttest-out.srt 2016-12-25 09:44:55.577107368 +0000
@@ -1,169 +1,169 @@
1
-00:00:04,500 --> 00:00:07,250
+00:00:09,500 --> 00:00:12,250
Hello, my name is Yorgui Loeffler, and I'm in Quebec, dang it!
2
-00:00:07,670 --> 00:00:10,930
+00:00:12,670 --> 00:00:15,930
In this lesson, I will be showing you a few things that are important to me.
3
-00:00:13,720 --> 00:00:15,870
+00:00:18,720 --> 00:00:20,870
Always scratch with a pick!
4
-00:00:45,200 --> 00:00:49,290
Merry Christmas! :-)
Hi there,
Just a heads up — I just tried using this library on a collection of SRT files I have, and I'm getting an SRTParseError exception on each file. I've attached one of the files here:
srttest.txt