Open abuali129 opened 7 years ago
The only difference between the two files you provided is the text content and length (if they are both utf-8 encoded).
The "corrupted" one seems to be missing the character ID prefixes ([C=37]) in each line. Perhaps the game can only "skip" to lines with a character ID.
Original:
<Line Text="[C=37]Forgive me, but my schedule has changed.">
<Timing Start="576" End="830" />
</Line>
Modded:
<Line Text=".ばチをす ぬウ タガぐケろぅ オズぬち コォガ ぁタガ つケふぉ ">
<Timing Start="576" End="830" />
</Line>
I already tried having the character ID prefixes, but same result. I found another entries that had the same issues with it. I'll make it ready just in the next moments.
Entry Id="856784307" Cassette tape is Truth Records Track Secret Recording with PAZ and ZERO
Original https://mega.nz/#!uRVTSDTa!UB7xpGYQY5WcEQrw02G0JF2MqC1-YX9YZPiCWPUkKtg Modified https://mega.nz/#!7A1WmChD!YxvS2283CXup87wsXOtNNUTA9Cc-eo5t155vx07DxZg Corrupted https://mega.nz/#!Td8RXZwZ!Un70HOvJIJcFP8efntkQ0HVSRXm-8FZnhCroHrzuoiw
Look at Lines# 138 & 336
in the Modified version, the file works probably, that's because I didn't touch these 2 lines.. in the Corrupted Version I add my text and same issue as before happened.
on the last sample that I provide, I did some tests. It appears that if the length of the whole file exceeds 17,572 the problem exists. I didn't take the length in mind on my project before, I thought that it is not gonna cause a problem... well do some test on the first sample just to get a good picture of what is happening
It could also be related to the characters in a line and the line length.
Your example is a lot larger than the original line.
[C=20]ザはズぐぢケガく ホざアばをガく タア クスゴぉ
Could you try replacing the original line with substrings of varying length of your modded line? (1, 2, 3... characters) Maybe you can find out which length triggers the corruption.
Yes I tried that already, I even put english words instead -with respect of crossing the maximum length-, l am pretty sure that the problem happens because I passed the maximum length in the entry. I did some tests on both files a I got a clear picture of what happened.
The conclusion of this, is that whenever the -length- if the whole enrty exceeds a specific lenght that each entry could take, no matter what substring is causing that, substring itself is not related directly but the length of the whole thing.
Hope if there's any means to increase the "length limit". And by the way not all characters equal in lenght, some of them as for one letter it add +4 to the length. I think that's related to unicode coding of the characters.
You're right with the different sizes for different UTF-8 codepoints.
As each entry in a subp file is saved as a single string with $-characters separating the lines, the max character limit per entry should be (assuming the entry has at least one line): 2^16-len(lines)-3
As the game can't load these files correctly there have to be some other limits. Could you perhaps check which unmodified file has the largest entry and check if the supported size can be increased by changing the flags?
update: seems like that reducing the length in the first sample cannot help either. I managed to shrink down the length of the first sample to 21987 by combining some line texts strings along with changing the timing value for them, maybe the method itself is not working?! I still don't know. here is the result https://mega.nz/#!vYcTHKaC!eHBW258SCIP0UXOWFPJ5xHrxvKfIYbjKHtPMP_v1ISw
As for flags that you mentioned, I reported earlier on another raised issue that Flags value is related to content, 1024 is for cutscenes, 768 for cassette tapes, and others have other uses.
Combining 2 lines will just save a single byte. 3472 of the 4525 UTF-8 codepoints used in your latest example are 3 bytes wide (the rest are 1 byte wide). So you won't save much space by combining them.
Did you check if the size limit you found is the same for each subp file or if some of them have different limits?
The size limit, I'm not talking about the .supb because I have files that is have more bytes in it and it is working perfectly.
But, the size limit of an entry inside the .subp is different from each one as for provided samples, the first sample length limit is 22,420, the 2nd sample length limit is 17,572
You're mapping arbitrary Japanese UTF-8 codepoints to Arabic letters, right? Could you try using only codepoints that are 1 byte wide instead of using the 3 byte ones? That alone could net you 6944 additional codepoints (to your latest example) before the corruption will start again.
I had a third entry sample also which was corrupted but know I managed to fix it. If you want to look at it just let me know, also the 2nd sample is fixed just by removing some unwanted spaces, but the first one is something that cannot be repair here is the second sample fixed https://mega.nz/#!mQ8FVZwI!pC4-oZslXXptnyJWbYjiLqgwvgZQHI9NPDUXlD5a1UU
I tried using 1 byte letters as you suggested for the first sample, but still they can't cover all of the Arabic letters then I ended up using 2 byte letters with them, still the file is in corrupted status. Even if I merged line texts -which was the solution for the third sample- still no benefit. I managed to shrink to length to 21862 with 2,3 byte letters, and to 17,537 with merging line texts
How many distinct letters are there in the Arabic alphabet (+numerics and punctuation)? You should see that the most frequently used letters are encoded in 1 or 2 byte codepoints to save additional space. Either use this as source or analyze the frequency of your own subtitles.
Only the letters and punctuation 140 in total, numbers and symbols are shared with Latin, also I cannot replace the one byte Latin letters as I use them almost. Anyway, looks like I will skip translating Entry Id="2161021477".
That's unfortunate.
Since I can't change the limits imposed by the engine I'd rather print an error if one of the subtitles doesn't fit in an entry.
I'll have to analyze all the unedited subp files to get some more facts about the limits.
Any information I can provide for this? You just have to ask. And thanks hundred times for the awesome tool
Could you perhaps upload a zip archive with all subtitles? I don't have the game installed right now and would have to redownload it first.
Add me on Steam as sharing all these files publically here is likely against the Github ToS.
All right, Steam id same as here?
There are three users by your name, I cannot identify you :)
Invitation sent
The entry with id 2161021477 is indeed the largest one in all subs.
The max sizes (in bytes) in the unmodified files are as follows: File: 513497 Entry: 7517 Line: 308
As long as these aren't exceeded the game should load them fine. Anything above these values needs some more testing.
I will look at this in the evening, thanks.
I got 573,670 bytes for modified file working fine except for entry id 2161021477 however, along side the modified files I but back the original entry id 2161021477 the result is 572,211 bytes without any problems, however last 19 entries still at the original status
Entry Id="3976005522"
Entry Id="3983176914"
Entry Id="3985838335"
Entry Id="4015605908"
Entry Id="4033776865"
Entry Id="4038494047"
Entry Id="4044911970"
Entry Id="4084410907"
Entry Id="4123126871"
Entry Id="4131631805"
Entry Id="4181857144"
Entry Id="4201908311"
Entry Id="4205344688"
Entry Id="4209208445"
Entry Id="4230996696"
Entry Id="4272505980"
Entry Id="4275351727"
Entry Id="4277855698"
Entry Id="4289530536"
I'm still continue with my project, and I develop a pattern for localizing the TPP to arabic, which is appear to be successful in every .subp and all entries in them. But there's one entry that whatever I do it is get corrupted inside the game,
Entry Id="2161021477" in the tape.subp Cassette tape is Skull Face's Objective [4] Track Secret Recording of Skull Face and Code Talker [2]
Whenever I modify this entry and put it on the game, the subtitles won't show, and also the rewind and forward buttons are getting corrupted. I provide a two .subp sample containing just the subjected entry, for both the original and modified one. Also see the videos to look at the original behavior and the corrupted
Videos original https://www.youtube.com/watch?v=VpmW2LG6oBk corrupted https://www.youtube.com/watch?v=xNhw1nteyLg
Samples Original https://mega.nz/#!zM8khIwb!L-ORh-oHNcA3H1NqgC7YUOijx6oeTvp4BelGiJW6MbU corrupted https://mega.nz/#!rEkFDSaL!OpJ2ntxyVx3ittkq2r6i72V2U_GoRYIU7LI5YBhx24E