Open xpufx opened 1 year ago
Hello @xpufx
Could you also provide all the parameters you are passing as argument?
gpt-subb -k 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 -l tr Il.Giovane.Montalbano.S01E01.La.prima.indagine.di.Montalbano.srt
(I will kill the api key now. no problem)
@xpufx could you please verify if this also happen with multiple lined messages that DOESN'T have numbers mixed with text?
Also thanks for your collaboration
I tried a little snippet from english to turkish. Similar situation. I am attaching them below. (Timestamps are correct for some show but I changed the text to some nonsensical stuff just in case. I tried to keep the format the same just in case there might be nonprintable characters I am not seeing). Added .txt extension for github to allow uploads.
Source. genericsubtitle.srt.txt
Result. genericsubtitle.tr.srt.txt
@SkyaTura I can confirm that the problem still exists :/
Sorry folks, I had no time to check this yet. However, I became more familiarized with the openai api, and I already know what may be going on. In addition to that, I also understand better about tokenization now.
This been said, I'll refactor this project for a better consistent results and more cost efficiency either
Input:
12 00:03:06,733 --> 00:03:11,832
13 00:03:11,852 --> 00:03:15,871
Output:
12 00:03:06,733 --> 00:03:11,832
13 00:03:11,852 --> 00:03:15,871