Atvaark / FoxEngine.TranslationTool

Fox Engine text and font converter
MIT License
18 stars 11 forks source link

Editing of one enrty cannot be successful what so ever #16

Open abuali129 opened 7 years ago

abuali129 commented 7 years ago

I'm still continue with my project, and I develop a pattern for localizing the TPP to arabic, which is appear to be successful in every .subp and all entries in them. But there's one entry that whatever I do it is get corrupted inside the game,

Entry Id="2161021477" in the tape.subp Cassette tape is Skull Face's Objective [4] Track Secret Recording of Skull Face and Code Talker [2]

Whenever I modify this entry and put it on the game, the subtitles won't show, and also the rewind and forward buttons are getting corrupted. I provide a two .subp sample containing just the subjected entry, for both the original and modified one. Also see the videos to look at the original behavior and the corrupted

Videos original https://www.youtube.com/watch?v=VpmW2LG6oBk corrupted https://www.youtube.com/watch?v=xNhw1nteyLg

Samples Original https://mega.nz/#!zM8khIwb!L-ORh-oHNcA3H1NqgC7YUOijx6oeTvp4BelGiJW6MbU corrupted https://mega.nz/#!rEkFDSaL!OpJ2ntxyVx3ittkq2r6i72V2U_GoRYIU7LI5YBhx24E

Atvaark commented 7 years ago

The only difference between the two files you provided is the text content and length (if they are both utf-8 encoded).

The "corrupted" one seems to be missing the character ID prefixes ([C=37]) in each line. Perhaps the game can only "skip" to lines with a character ID.

Original:

<Line Text="[C=37]Forgive me, but my schedule has changed.">
    <Timing Start="576" End="830" />
</Line>

Modded:

<Line Text=".ばチをす ぬウ タガぐケろぅ オズぬち コォガ ぁタガ つケふぉ         ">
    <Timing Start="576" End="830" />
</Line>
abuali129 commented 7 years ago

I already tried having the character ID prefixes, but same result. I found another entries that had the same issues with it. I'll make it ready just in the next moments.

abuali129 commented 7 years ago

Entry Id="856784307" Cassette tape is Truth Records Track Secret Recording with PAZ and ZERO

Original https://mega.nz/#!uRVTSDTa!UB7xpGYQY5WcEQrw02G0JF2MqC1-YX9YZPiCWPUkKtg Modified https://mega.nz/#!7A1WmChD!YxvS2283CXup87wsXOtNNUTA9Cc-eo5t155vx07DxZg Corrupted https://mega.nz/#!Td8RXZwZ!Un70HOvJIJcFP8efntkQ0HVSRXm-8FZnhCroHrzuoiw

Look at Lines# 138 & 336

in the Modified version, the file works probably, that's because I didn't touch these 2 lines.. in the Corrupted Version I add my text and same issue as before happened.

abuali129 commented 7 years ago

on the last sample that I provide, I did some tests. It appears that if the length of the whole file exceeds 17,572 the problem exists. I didn't take the length in mind on my project before, I thought that it is not gonna cause a problem... well do some test on the first sample just to get a good picture of what is happening

Atvaark commented 7 years ago

It could also be related to the characters in a line and the line length.

Your example is a lot larger than the original line. [C=20]ザはズぐぢケガく ホざアばをガく タア クスゴぉ

Could you try replacing the original line with substrings of varying length of your modded line? (1, 2, 3... characters) Maybe you can find out which length triggers the corruption.

abuali129 commented 7 years ago

Yes I tried that already, I even put english words instead -with respect of crossing the maximum length-, l am pretty sure that the problem happens because I passed the maximum length in the entry. I did some tests on both files a I got a clear picture of what happened.

abuali129 commented 7 years ago

The conclusion of this, is that whenever the -length- if the whole enrty exceeds a specific lenght that each entry could take, no matter what substring is causing that, substring itself is not related directly but the length of the whole thing.

Hope if there's any means to increase the "length limit". And by the way not all characters equal in lenght, some of them as for one letter it add +4 to the length. I think that's related to unicode coding of the characters.

Atvaark commented 7 years ago

You're right with the different sizes for different UTF-8 codepoints.

As each entry in a subp file is saved as a single string with $-characters separating the lines, the max character limit per entry should be (assuming the entry has at least one line): 2^16-len(lines)-3

  1. 2^16-1 is max 16bit
  2. len(lines)-1 is the amount of $-characters required to separate the lines
  3. -1 for the NULL-terminator of the entry

As the game can't load these files correctly there have to be some other limits. Could you perhaps check which unmodified file has the largest entry and check if the supported size can be increased by changing the flags?

abuali129 commented 7 years ago

update: seems like that reducing the length in the first sample cannot help either. I managed to shrink down the length of the first sample to 21987 by combining some line texts strings along with changing the timing value for them, maybe the method itself is not working?! I still don't know. here is the result https://mega.nz/#!vYcTHKaC!eHBW258SCIP0UXOWFPJ5xHrxvKfIYbjKHtPMP_v1ISw

As for flags that you mentioned, I reported earlier on another raised issue that Flags value is related to content, 1024 is for cutscenes, 768 for cassette tapes, and others have other uses.

Atvaark commented 7 years ago

Combining 2 lines will just save a single byte. 3472 of the 4525 UTF-8 codepoints used in your latest example are 3 bytes wide (the rest are 1 byte wide). So you won't save much space by combining them.

Did you check if the size limit you found is the same for each subp file or if some of them have different limits?

abuali129 commented 7 years ago

The size limit, I'm not talking about the .supb because I have files that is have more bytes in it and it is working perfectly.

untitled

But, the size limit of an entry inside the .subp is different from each one as for provided samples, the first sample length limit is 22,420, the 2nd sample length limit is 17,572

Atvaark commented 7 years ago

You're mapping arbitrary Japanese UTF-8 codepoints to Arabic letters, right? Could you try using only codepoints that are 1 byte wide instead of using the 3 byte ones? That alone could net you 6944 additional codepoints (to your latest example) before the corruption will start again.

abuali129 commented 7 years ago

I had a third entry sample also which was corrupted but know I managed to fix it. If you want to look at it just let me know, also the 2nd sample is fixed just by removing some unwanted spaces, but the first one is something that cannot be repair here is the second sample fixed https://mega.nz/#!mQ8FVZwI!pC4-oZslXXptnyJWbYjiLqgwvgZQHI9NPDUXlD5a1UU

I tried using 1 byte letters as you suggested for the first sample, but still they can't cover all of the Arabic letters then I ended up using 2 byte letters with them, still the file is in corrupted status. Even if I merged line texts -which was the solution for the third sample- still no benefit. I managed to shrink to length to 21862 with 2,3 byte letters, and to 17,537 with merging line texts

Atvaark commented 7 years ago

How many distinct letters are there in the Arabic alphabet (+numerics and punctuation)? You should see that the most frequently used letters are encoded in 1 or 2 byte codepoints to save additional space. Either use this as source or analyze the frequency of your own subtitles.

abuali129 commented 7 years ago

Only the letters and punctuation 140 in total, numbers and symbols are shared with Latin, also I cannot replace the one byte Latin letters as I use them almost. Anyway, looks like I will skip translating Entry Id="2161021477".

Atvaark commented 7 years ago

That's unfortunate.

Since I can't change the limits imposed by the engine I'd rather print an error if one of the subtitles doesn't fit in an entry.

I'll have to analyze all the unedited subp files to get some more facts about the limits.

abuali129 commented 7 years ago

Any information I can provide for this? You just have to ask. And thanks hundred times for the awesome tool

Atvaark commented 7 years ago

Could you perhaps upload a zip archive with all subtitles? I don't have the game installed right now and would have to redownload it first.

Add me on Steam as sharing all these files publically here is likely against the Github ToS.

abuali129 commented 7 years ago

All right, Steam id same as here?

abuali129 commented 7 years ago

There are three users by your name, I cannot identify you :)

Atvaark commented 7 years ago

Link

abuali129 commented 7 years ago

Invitation sent

Atvaark commented 7 years ago

The entry with id 2161021477 is indeed the largest one in all subs.

The max sizes (in bytes) in the unmodified files are as follows: File: 513497 Entry: 7517 Line: 308

As long as these aren't exceeded the game should load them fine. Anything above these values needs some more testing.

abuali129 commented 7 years ago

I will look at this in the evening, thanks.

abuali129 commented 7 years ago

I got 573,670 bytes for modified file working fine except for entry id 2161021477 however, along side the modified files I but back the original entry id 2161021477 the result is 572,211 bytes without any problems, however last 19 entries still at the original status

Entry Id="3976005522"
Entry Id="3983176914"
Entry Id="3985838335"
Entry Id="4015605908"
Entry Id="4033776865"
Entry Id="4038494047"
Entry Id="4044911970"
Entry Id="4084410907"
Entry Id="4123126871"
Entry Id="4131631805"
Entry Id="4181857144"
Entry Id="4201908311"
Entry Id="4205344688"
Entry Id="4209208445"
Entry Id="4230996696"
Entry Id="4272505980"
Entry Id="4275351727"
Entry Id="4277855698"
Entry Id="4289530536"