Closed Lattyware closed 5 years ago
You can extract these files with the following process:
_string.pyd
/_string.so
module for your platform from the lib
sub-directory for your platform (it's in different places depending on the platform).loader.pyo
from renpy
uncompyle6
to get loader.py
from loader.pyo
.verificationcode
(e.g: verificationcode = _string.sha1('1a7ee58d44d3c6dc943600c2b0c7f13670a6f5a0')
)ZiX-12B 000000000a849dd5 6e641880
)._string
module obtained earlier. import _string
and then run the verificationcode
line - the result of this is your key
._string.offset()
with the argument as the last section of the first line of the archive (e.g: _string.offset("6e641880")
). The result of this is your offset
.Now modify unrpa
to replace the offset
and key
under elif self.version == 3:
with the values you just obtained. E.g:
elif self.version == 3:
line = f.readline()
parts = line.split()
offset = 141453332
key = 572015977
If you force the version to 3
, this should now successfully decode the archive.
Clearly, this is a massive pain. It could be made a little nicer by having command line arguments to override key
/offset
, and further by reverse-engineering the two methods from the _string
module, which is presumably a cython
module.
There is a final step - the directory structure and files are correct, but the extracted images are still scrambled. In the same 2.x environment as above, each file needs to be fixed with this process:
>>> import _string
>>> verificationcode = _string.sha1(...)
>>> rv = open("extracted.png", "rb")
>>> out = open("extracted_fixed.png", "wb")
>>> out.write(_string.run(rv.read(64), verificationcode) + rv.read())
Obviously this is a pain to do by hand. Automating this would be nice, but as we are relying on _string
, which is targeted on 2.x, it would have to be a separate script. The ideal solution is reverse-engineering _string
, as mentioned previously.
The latest version has some extra handling to point the user here if they try and extract an archive of this type.
I have also made a secondary script that automates the above process - it will be somewhat fragile and still relies on the original _string
module, meaning it isn't ideal. Reverse engineering that module will still be needed for proper support, but this should make it easier until then.
hello sir i am a rookie.,i read your novel just now . "Take the _string.pyd/_string.so module for your platform from the lib sub-directory for your platform " Could u please tell me where to find the file named "_string.pyd/_string.so",in python environment or ren'py environment? i had find in both of them but couldn't.
@yetk The file will be inside the lib
folder in the renpy
folder of the game you are trying to extract from. The exact path will depend on your platform (Windows, Linux, Mac).
I managed to track down a copy of _string.pyd
(MD5 BCD019154309731EB1780546E2E82155) and reverse it. I now know more about cython internals than I ever wanted to. I made a python version of the module that should be easy enough to integrate. I've tested it on random input but not a complete archive. Looking at other games by the same company, they have different loader versions you could also support.
import struct
def sha1(code):
a=int(filter(str.isdigit,code))+102464652121606009
b=round(a**(1./3))/23*109
return int(b)
def offset(offset):
a=offset[7:5:-1]
b=offset[:3]
c=offset[5:2:-1]
return int(a+b+c,16)
def run(s,key):
keys=(3621826839565189698,8167163782024462963,5643161164948769306,4940859562182903807,2672489546482320731,8917212212349173728,7093854916990953299)
out=''
for i in range(0,len(s),8):
enc=struct.unpack("<Q",s[i:i+8])[0]
dec=keys[i%7]^key^enc
out=out+struct.pack("<Q",dec)
return out
Nice work! I took a look at trying to reverse engineer it myself and it looked like a massive pain in the ass, so congrats on getting through that. As soon as I have the chance I'll take a look at integrating this into unrpa
properly, which should be trivial enough given the pure-python implementation you have provided.
I am naturally open to adding any other formats found in the wild, feel free to throw me information about any other ones if you want support added.
Resolved as of 2.0.0 (f54191b7746d24a79d6264accdba5ce641364b15).
Minor Note: obfuscated_amount is also loader dependent.
In general, we need to take care with post processing as it only applies to some rpa archives
The default postprocessing()
implementation is just a pass-through that does nothing - only the ZiX-12B implementation does anything there.
I see how obfuscated_amount
could be varied. I'll write up a fix that is dynamic over that. If you have any other examples of the format, I'd love to have some more test cases.
The problem with the ZiX-12B implementation is that it applies postprocessing to all zix archives. However, the loader only applies it to specific ones. The list of said archives is also loader specific. See other VNs by the same company for example.
Oh, I see. That's something I didn't even think to look for. I'll fix that along with the other change and push a new version when I get a chance. Let me know if there is anything else you notice, and thanks for all the help getting this one implemented.
It's actually nicer than I thought - I was assuming it was based on archive name, but it's not - the ones without post-processing have a ZiX-12A header instead, so they are just a separate format to be handled without the post-processing.
Current concerns should be fixed as of 27ca4a65756be018c84bea22da4cf5c1f18da5ef. Let me know if anything else comes up.
It appears some Ren'Py games are starting to ship with custom loader scripts for a non-standard variant of RPA archives.
unrpa
currently can't deal with these archives.The archives that do this seen in the wild seem to be identifiable as they begin with a
ZiX-12B
header, not the expectedRPA-3.0
/RPA-2.0
. This appears to be an in-house obfuscation technique.The route to decoding these files is to use
uncompyle6
to turn 'loader.pyo' from the game the archive comes from into readable code. This should allow you to modifyunrpa
to load the archive. It appears to use a compiled cython module called_string
to perform parts of the process.It appears the system is to use a hard-coded hey in the loader. Ideally, we could identify this type of archive by the header and offer additional tooling to extract that key, alongside an option to manually set the key as an argument.
(This is the root cause of #13).
Edit: There is a script to make extracting these possible, but proper support isn't here yet. See below for details on how to extract an archive of this type now.
Edit: For transparency, I will note I worked out who the developer was who created this technique, and had their name listed here previously. At their request, I have removed a direct reference to them from this post, as it's not really relevant. The partial support for the format and the documentation of the effort here will remain up, however. I am still happy to accept pull requests to solve this issue properly and add full support to unrpa.