diasurgical / psx-tools

Tools for the Playstation port of Diablo.
21 stars 2 forks source link

How can I read Japanese text? #4

Open uwodb opened 4 years ago

uwodb commented 4 years ago

text.zip

Do you know what encoding it is used? I don't know Can anyone help me with this? Thanks in advance :)

galaxyhaxz commented 3 years ago

The entire Japanese character set is quite large, so they came up with a custom encoding to optimize it.

Check out the decompiled code here to get a general idea of how it works: https://github.com/diabpsx/skeleton/blob/master/JAP_1998_05_29/DIABPSX/PSXSRC/KANJI.CPP

The .OUT file is the font data which contains only the characters used for the text. The .JAP language file is an optimized SHIFT-JIS that points to indices in that file.

The .LGH file is the file you want. It's the regular SHIFT-JIS encoded Japanese text before it was optimized. My guess is they left these on the disc by accident since they aren't used which is funny since it defeats the purpose of reducing file sizes.

All work related to the PSX version is being continued in the aforementioned link, pretty sure this repo is dead since I left.

uwodb commented 3 years ago

WOW! This is what I'm looking for :) Thank you for your help.

galaxyhaxz commented 3 years ago

There was a set of about 60 beta Japanese builds for the PSX sold at auction a few years ago. Don't forget to tell your friends that help is needed and ask if they know anything about the matter. Those discs could contain the source code or original assets, so I don't have to spend countless hours reversing this crap. We could have Kanji support sooner ;)

uwodb commented 3 years ago

My goal is Korean & Japanese language support in devilutionX and I wanted to read the PSX's Japanese text. Thank you :) https://github.com/diasurgical/devilution/issues/1762 It seemed to me that they want to keep the original. So I give up using ttf and I'm using my customized bitmap font.

AJenbo commented 3 years ago

Please see https://github.com/diasurgical/devilutionX/issues/66 for the latest status regarding translation and font handling in DevilutionX.

galaxyhaxz commented 3 years ago

Thankfully we have fonts for both Korean and Japanese! See here: https://d2mods.info/forum/viewtopic.php?t=55894

Diablo 2 has pixel mapped fonts ready for color transforming, in the same sizes as d1! So 42, 30, 24, and 16 pixels! Diablo 2 uses utf8 iirc rather than shift jis.

uwodb commented 3 years ago

Really? I didn't know what Diablo 2 used bitmap font. :( Warcraft(1994) bitmap, ascii Warcraft II(1995) bitmap, ascii Diablo(1996) bitmap, ascii Starcraft(1998) bitmap, ascii Diablo II(2000) bitmap?, unicode? Warcraft III(2002) ttf, unicode I see...

AJenbo commented 3 years ago

D1 uses TTF for some UI elements.

galaxyhaxz commented 3 years ago

Diablo 2 has the standard ascii bitmap font for most languages. It has a unicode font for japanese, korean, and chinese. There is also an ascii Russian and Polish font. On top of that it has some extra fonts like a small font and formal font for typing+UI.

Diablo 1 had a combination of pixel fonts and TTF. The developers outsourced the UI to Blizzard South who in turn got lazy and decided to use proprietary TTF for formal fonts instead of coming up with a proper font like they did in Diablo 2.

See below an example of things you can do with the 6pt small font from D2. DIABLO_20201210_025658

galaxyhaxz commented 3 years ago

Added font dumping tool. So just as I feared it looks like there isn't a way to directly translate the Japanese text files back. Since the special characters (0x8000+) just point to pixel data, one would either need a Japanese keyboard or have some sort of OCR that can remap them to standard SJIS/Unicode. Example: If the file has the character 0x8265, it references an address in MAINTXT.OUT which has a bit-based pixel font. Here is the output from the tool, with the pound symbol representing the pixels:

----- Id 34 (0x8265) -----
    #       
    #       
 ########   
   #        
   #        
  # #####   
  ##     #  
  #      #  
         #  
        #   
   #####    
uwodb commented 3 years ago

Great. Translation of the PSX version is possible as well as translation of DevilutionX.

galaxyhaxz commented 3 years ago

If anyone is still reading this, I'm in need of a Japanese speaker for assistance. I have a file that has all Japanese glyphs not mapped printed out like above, but I have no idea what they mean and need them typed out so I can map the Shift-JIS code to the binary.

Edit: there are about a total of 100 glyphs left. So it shouldn't take too long.

uwodb commented 3 years ago

I will see what I can do :) Show me your file and I'll take a look

galaxyhaxz commented 3 years ago

That's great @uwodb ! Below is a file with all the missing characters, printed out to look like pixels. I've already mapped the rest out with automation. Once we have these last things mapped we can convert the lore text back into text format.

Download: missing.txt

uwodb commented 3 years ago

this results are not always accurate because it is not automated :( 9333=競 9369=協 938D=境 93D5=況 93F9=狭 940B=胸 942F=響 9453=凝 94BF=僅 94D1=緊 953D=屈 9585=係 95BB=形 95CD=掲 9603=継 9627=警 97C5=固 97D7=弧 97FB=互 9831=誤 9843=交 98C1=坑 98D3=拘 98E5=控 9951=郊 9A29=困 9AA7=鎖 9ACB=挫 9B13=砕 9B25=際 9B6D=策 9B7F=索 9B91=錯 9BA3=擦 9BEB=惨 9C45=刺 9CD5=市 9CF9=志 9D65=至 9E07=七 9E2B=嫉 9E3D=室 9ECD=釈 9F03=惹 9FC9=宗 A023=讐 A047=醜 A07D=従? A0E9=瞬 A0FB=殉 A11F=巡 A179=諸 A1AF=序 A1C1=徐 A1F7=召 A251=尚 A275=晶 A2BD=証 A329=状 A35F=伸 A383=侵 A3B9=浸 A3DD=申 A46D=陣 A4B5=遂 A4D9=枢 A69B=節 A6E3=宣 A7BB=疎 A7DF=訴 A803=創 A815=双 A86F=巣 A8A5=窓 A8B7=総 A8C9=荘 A8FF=送 A96B=側 A97D=則 AA0D=孫 AA1F=尊 AA31=村 AA67=唾 AAD3=態 ABBD=担 AC05=弾 AC83=致 ACA7=秩 ACB9=着 AD7F=頂 ADA3=沈 AE33=廷 AE7B=徹 AE9F=展 AEB1=転 AEF9=堵 AF0B=塗 AF1D=妬 AFD1=統 B073=洞 B0A9=徳 B0DF=独 B235=波 B26B=廃 B2C5=薄 B2D7=迫 B30D=肌 B331=罰 B38B=繁 B3AF=卑 B3E5=比 B3F7=疲 B4AB=貧 B4E1=布 B53B=赴 B595=風 B5A7=副 B5DD=福 B649=奮 B67F=併 B757=奉 B7D5=飽 B7F9=妨 B853=謀 B865=貌 B877=貿 B8BF=摩 B9BB=務 B9F1=冥 BA39=盟 BA5D=鳴 BAA5=模 BAC9=猛 BB11=悶 BB7D=躍 BBD7=有 BBFB=裕 BC31=余 BC55=余 BCD3=踊 BCE5=遥 BD09=浴 BDAB=律 BE17=侶 BE3B=虜 BE71=糧 BEA7=臨 BEDD=令 BEEF=冷 BF7F=路 BFA3=弄 BFD9=論 C021=枠 C033=墟 C045=愕 C057=枷 C069=沐 C07B=狡 C08D=禍 C09F=瞞 C0B1=膠 C0C3=貪 C0D5=踪

galaxyhaxz commented 3 years ago

Wow, incredible work @uwodb! Very thankful you typed these out, as it would have taken me forever fiddling with OCR software and the like. As a result, all but two characters are mapped and everything seems to be translating back correctly!

When you get the chance, could you have a second look at BC55 and C08D? BC55 appears to be a duplicate of BC31 but it looks a bit different.

Please find below all of the game's text restored back into Shift-JIS!!! Note that the lore section is missing those two characters and may have some slight errors, the other two should be perfect.

jap.zip.txt RENAME TO ZIP

uwodb commented 3 years ago

Wow! Thanks for noticing. BC31=余 BC55=幼 C08D=猾 What's the next plan?

galaxyhaxz commented 3 years ago

I guess once translation support is complete the text can be used. I'm working on my own game engine, but progress is a bit slow. I'd anticipate you should have translation support soon in DevilutionX, though fonts are missing for asian languages. For now you can use the Diablo 2 versions.