SubtitleEdit / subtitleedit-cli

Subtitle Edit cli (without System.Drawing)
GNU General Public License v3.0
23 stars 4 forks source link

Failing to Convert SRT to 890 in Arabic/Hebrew #9

Closed yovelcohen closed 7 months ago

yovelcohen commented 7 months ago

A simple conversion attempt from srt to 890 fails when in Arabic and Hebrew, but works with english. seconv functions/srts/s02e01/en_glix.srt Cavena890 -> Works seconv functions/srts/s02e01/he_glix.srt Cavena890 -> Fails with the error:

1 file(s) converted in 00:00:00.2551974

1: he_glix.srt -> functions/srts/s02e01/he_glix_4.890...
ERROR: Index was outside the bounds of the array.
   at seconv.libse.SubtitleFormats.Cavena890.GetTextAsBytes(String text, Int32 languageId) in /Users/user/PycharmProjects/glixFunctions/pythonProject/subtitleedit-cli/src/se-cli/libse/SubtitleFormats/Cavena890.cs:line 692
   at seconv.libse.SubtitleFormats.Cavena890.WriteText(Stream fs, String text, Boolean isLast, Int32 languageIdLine, Boolean useBox) in /Users/user/PycharmProjects/glixFunctions/pythonProject/subtitleedit-cli/src/se-cli/libse/SubtitleFormats/Cavena890.cs:line 680
   at seconv.libse.SubtitleFormats.Cavena890.Save(String fileName, Stream stream, Subtitle subtitle, Boolean batchMode) in /Users/user/PycharmProjects/glixFunctions/pythonProject/subtitleedit-cli/src/se-cli/libse/SubtitleFormats/Cavena890.cs:line 352
   at seconv.libse.SubtitleFormats.Cavena890.Save(String fileName, Subtitle subtitle, Boolean batchMode) in /Users/user/PycharmProjects/glixFunctions/pythonProject/subtitleedit-cli/src/se-cli/libse/SubtitleFormats/Cavena890.cs:line 346
   at seconv.CommandLineConverter.BatchConvertSave(String targetFormat, TimeSpan offset, String deleteContains, TextEncoding targetEncoding, String outputFolder, String targetFileName, Int32 count, Int32& converted, Int32& errors, List`1 formats, String fileName, Subtitle sub, SubtitleFormat format, Object binaryParagraphs, Boolean overwrite, Int32 pacCodePage, Nullable`1 targetFrameRate, ICollection`1 multipleReplaceImportFiles, List`1 actions, Nullable`1 resolution, Boolean autoDetectLanguage, BatchConvertProgress progressCallback, String ebuHeaderFile, String ocrEngine, String preExt, Nullable`1 renumber, Nullable`1 adjustDurationMs) in /Users/user/PycharmProjects/glixFunctions/pythonProject/subtitleedit-cli/src/se-cli/CommandLineConverter.cs:line 1157
   at seconv.CommandLineConverter.Convert(String[] arguments) in /Users/user/PycharmProjects/glixFunctions/pythonProject/subtitleedit-cli/src/se-cli/CommandLineConverter.cs:line 650

So I did a bit of search and convert thing , where I execute the script on the srt file, then removing blocks of rows until it was able to do the conversion. the row the conversion fails on is:

747
00:33:16,883 --> 00:33:20,061
אני רוצה שתסמוך עליי... אני צריך שתסמוך עליי, אוקיי?

which is not looking too weird

niksedk commented 7 months ago

Could you attach a whole file? (Github accepts .zip files)

yovelcohen commented 7 months ago

you mean the subtitles? yeah sure hebrew_glix.srt.zip

I tried with multiple files, and also patching in the cavena890 converter from subtitle-edit to this one, but it raised different import related imports so I reverted.

niksedk commented 7 months ago

Thx for the file :)

In "Cavena 890" single lines should not exceed 50 characters... if they do, they will be truncated

yovelcohen commented 7 months ago

@niksedk thanks, I wasn't aware of that limit. and thanks for the quick and helpful support, it's truly amazing 🙏