byroot / pysrt

Python parser for SubRip (srt) files
GNU General Public License v3.0
451 stars 69 forks source link

Can't parse text with empty line #71

Open ghost opened 7 years ago

ghost commented 7 years ago

Hi, and first thanks for this handy library. I don't know if my error is due to a bad srt file or if it's a bug, but whith a srt file like this:

1
00:22:10,440 --> 00:22:15,195
Je suis coincée au boulot,

j'aurai 10 minutes de retard.

305
00:22:15,960 --> 00:22:19,157
John, je suis dans les embouteillages.
La 5e Avenue est en travaux.

When I run the command: srt shift 35s file_with_empty_line.srt, I've got the following error:

PySRT-InvalidItem(line 5): 
Traceback (most recent call last):
  File "/home/john/Documents/git/pysrt/pysrt/srtfile.py", line 212, in stream
    yield SubRipItem.from_lines(source)
  File "/home/john/Documents/git/pysrt/pysrt/srtitem.py", line 83, in from_lines
    raise InvalidItem()
pysrt.srtexc.InvalidItem: j'aurai 10 minutes de retard.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/miniconda3/bin/srt", line 9, in <module>
    load_entry_point('pysrt', 'console_scripts', 'srt')()
  File "/home/john/Documents/git/pysrt/pysrt/commands.py", line 222, in main
    SubRipShifter().run(sys.argv[1:])
  File "/home/john/Documents/git/pysrt/pysrt/commands.py", line 140, in run
    self.arguments.action()
  File "/home/john/Documents/git/pysrt/pysrt/commands.py", line 161, in shift
    self.input_file.shift(milliseconds=self.arguments.time_offset)
  File "/home/john/Documents/git/pysrt/pysrt/commands.py", line 205, in input_file
    encoding=encoding, error_handling=SubRipFile.ERROR_LOG)
  File "/home/john/Documents/git/pysrt/pysrt/srtfile.py", line 153, in open
    new_file.read(source_file, error_handling=error_handling)
  File "/home/john/Documents/git/pysrt/pysrt/srtfile.py", line 181, in read
    self.extend(self.stream(source_file, error_handling=error_handling))
  File "/opt/miniconda3/lib/python3.5/collections/__init__.py", line 1091, in extend
    self.data.extend(other)
  File "/home/john/Documents/git/pysrt/pysrt/srtfile.py", line 215, in stream
    cls._handle_error(error, error_handling, index)
  File "/home/john/Documents/git/pysrt/pysrt/srtfile.py", line 311, in _handle_error
    sys.stderr.write(error.args[0].encode('ascii', 'replace'))
TypeError: write() argument must be str, not bytes
byroot commented 7 years ago

I don't know if my error is due to a bad srt file or if it's a bug

Well, the SRT format isn't really well specified. However I've never seen such blank line in the wild, and the srt files listing edges cases I've found around don't contain that either.

So I'd say, try reading it with VLC, if it does accept that blank line, then I'd be ok with trying to improve the parser.

In any case thanks for the report.

ghost commented 7 years ago

Thanks for your answer, I tried it in vlc and in fact the third line is not shown. With this srt content:

1
00:0:10,440 --> 00:00:15,195
Je suis coincée au boulot,

j'aurai 10 minutes de retard.

75
00:00:15,960 --> 00:00:19,157
John, je suis dans les embouteillages.
La 5e Avenue est en travaux.
J'ai rajouté une troisième ligne.

It shows:

Je suis coincée au boulot,

"j'aurai 10 minutes de retard." is not shown

and then the 3 lines are shown.

John, je suis dans les embouteillages. La 5e Avenue est en travaux. J'ai rajouté une troisième ligne.

Maybe pysrt could fix this by removing the empty line or almost have an option to write the file even if there's an error.