SkyaTura / gpt-subb

gpt-subb is a command-line tool to translate and convert subtitles using OpenAI's Chat-GPT language models
GNU General Public License v3.0
15 stars 3 forks source link

Only the first line of multiline subtitles are translated. #1

Open xpufx opened 1 year ago

xpufx commented 1 year ago

Input:

12 00:03:06,733 --> 00:03:11,832

13 00:03:11,852 --> 00:03:15,871

Output:

12 00:03:06,733 --> 00:03:11,832

13 00:03:11,852 --> 00:03:15,871

SkyaTura commented 1 year ago

Hello @xpufx

Could you also provide all the parameters you are passing as argument?

xpufx commented 1 year ago

gpt-subb -k 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 -l tr Il.Giovane.Montalbano.S01E01.La.prima.indagine.di.Montalbano.srt

(I will kill the api key now. no problem)

SkyaTura commented 1 year ago

@xpufx could you please verify if this also happen with multiple lined messages that DOESN'T have numbers mixed with text?

Also thanks for your collaboration

xpufx commented 1 year ago

I tried a little snippet from english to turkish. Similar situation. I am attaching them below. (Timestamps are correct for some show but I changed the text to some nonsensical stuff just in case. I tried to keep the format the same just in case there might be nonprintable characters I am not seeing). Added .txt extension for github to allow uploads.

Source. genericsubtitle.srt.txt

Result. genericsubtitle.tr.srt.txt

ghost commented 1 year ago

@SkyaTura I can confirm that the problem still exists :/

SkyaTura commented 1 year ago

Sorry folks, I had no time to check this yet. However, I became more familiarized with the openai api, and I already know what may be going on. In addition to that, I also understand better about tokenization now.

This been said, I'll refactor this project for a better consistent results and more cost efficiency either