djkaty / Il2CppInspector

Powerful automated tool for reverse engineering Unity IL2CPP binaries
http://www.djkaty.com
GNU Affero General Public License v3.0
2.62k stars 433 forks source link

Call the methods il2cpp exports. #39

Closed djkaty closed 4 years ago

djkaty commented 4 years ago

[Originally from @DannyParker0001]

This is only somewhat related, but something I remembered that would also be cool is to be able to call the methods il2cpp exports. All methods in il2cpp-api.cpp are exported (I think there were some quirks with it in very old versions of il2cpp). The methods let you get runtime type information, and they include some very cool methods such as being able to get the value of a static member at runtime (Game I'm interested has a god object with a static instance, so being able to use this would be pretty huge!) These methods require the metadata headers (Il2cppObject, Il2cppClass, etc) which is why I think it should be auto-generated and put into a header.

On x86/x64 you can get the Virtual Addresses of these and then call them via DLL injection, you could also use GetProcAddress on windows to get their address if wanted. I have written code in the past to list of all the exported methods in a x86/x64 binary, I could quite easily implement that if you wanted.

Since code is clearer than words: Exports of a dll (dumpbin /EXPORTS /path/to/your/dll, found in Start Menu -> Visual Studio (XXXX) -> xXX native tools...)

    ordinal hint RVA      name

          1    0 001CDDF0 CloseZStream
          2    1 001CDEA0 CreateZStream
          3    2 001CDFC0 Flush
                    ...
          9    8 001CE0C0 WriteZStream
         10    9 00159180 il2cpp_add_internal_call
         11    A 00159190 il2cpp_alloc
         12    B 001591A0 il2cpp_array_class_get
         13    C 001591B0 il2cpp_array_element_size
                    ...

In the il2cpp-api.cpp file there'd be something like:

void* il2cpp_alloc(size_t size)
{
    return IL2CPP_MALLOC(size);
}

So in the header something like

// ImageBase = 0x0B00000000 
void*(*il2cpp_alloc)(size_t size) = (void*(*)(size_t))0x0B00159190 

Alternatively you could use __declspec(dllimport) (Compiler specific!) for the same effect, may be better, not sure...

I'm not sure about other architectures, I could probably manage to do a PR of most of this for x86/x64 once you've mostly implemented the header generation system, I'm not sure how I'd get the symbol purely from the name though. I'm wondering if you'd be happy for me to do the PR (in future) even if its just for the one arch & possibly incomplete? Or if there would be a better way for me to contribute?

Originally posted by @DannyParker0001 in https://github.com/djkaty/Il2CppInspector/issues/37#issuecomment-649350228

djkaty commented 4 years ago

First, there are so many ways people want to manipulate IL2CPP applications it's always great to get a PR because I cannot possibly cover every use case on my own - especially the really niche ones - and some of them are also just straight up outside my knowledge of coding. So feel free to go ahead and PR, even if it doesn't get merged, as you said, code is better than words and it can help down the road 🙂

To this specific issue, I just want to confirm that I understand what you'd like:

  1. Scan IL2CPP binary and find all function exports (with a view to calling il2cpp's APIs)
  2. Get virtual addresses of these functions
  3. Output a .h with two components: A. The declarations for all of the internal objects IL2CPP uses like Il2CppClass etc., and B. Function pointer declarations for all of the exported functions

Is that correct? 🙂

I wasn't sure quite what you meant by "get the symbol purely from the name" so perhaps you can explain that a bit further.

As for actually submitting code, it's always welcome, and if you're not sure where to place specific pieces or how to accomplish something with Il2CppInspector's APIs, don't struggle, just ask! 🙂

DannyParker0001 commented 4 years ago

Yes thats correct, by "get the symbol purely from the name" I mean the exported functions dont have a type, just a name and an address. il2cpp-api-functions.h/il2cpp-api.cpp contains all the type information, I'm not really sure of a clean way to link them together (Since this API has changed a fair bit over time). So eg: 11 A 00159190 il2cpp_alloc is quite literally all the information available. Theres a func called il2cpp_alloc at RVA 00159190. Somehow I need to turn that into the C++ type void*(*il2cpp_alloc)(size_t size) I'm not sure how to do it for every version of il2cpp.

djkaty commented 4 years ago

If we have the function names, is not possible to just parse the .cpp files to get the function signatures?

DannyParker0001 commented 4 years ago

Yes, I just dont think parsing .cpp/.h files is trivial, and the definitions location has changed a bunch between early versions and now.

djkaty commented 4 years ago

I'm actually writing code to parse the .h files right now :) I want to be able to access the offset into a given type's fields programmatically (to automate the analysis of a disassembly), for that I have to parse every type to be able to correctly calculate each field's size.

It's quite quick and dirty and it's built on top of #38 so it can't go in yet, so whether it will be useful to you or not I don't know, but it can surely be adapted to spew out a method signature from a method name.

DannyParker0001 commented 4 years ago

Yeah, probably would be useful. The exports use to look like this: (5.6.03f)

    IL2CPP_EXPORT void il2cpp_init (const char* domain_name);
    IL2CPP_EXPORT void il2cpp_shutdown ();

but they were changed (Not sure when) but they look like this now: (2019.2.5f1)

#define DO_API_NO_RETURN(r, n, p) DO_API(r,n,p)

DO_API(int, il2cpp_init, (const char* domain_name));
DO_API(int, il2cpp_init_utf16, (const Il2CppChar * domain_name));

So its now using a C macro that's not even defined, It could be done, I'm just not sure how much this format has changed and how many separate parsers would need to be written...

djkaty commented 4 years ago

The actual text parsing is not difficult. If you have an export called 'il2cpp_init' we can just grep each line of the file for it and copy the function pointer declaration to the output. My main question is, are they in the same files/filenames for each version of il2cpp or have they moved it around numerous time? In the worst case we can just grep through the entire il2cpp folder for every version to find them but that will add an awful lot of deadweight to the repo and I'm not sure what the legality of that is.

nneonneo commented 4 years ago

As far as I can tell, the exports have always been stored in il2cpp-api-functions.h, and the format is very consistent between versions. 5.3.0f4:

DO_API( void, il2cpp_init, (const char* domain_name) );
DO_API( void, il2cpp_shutdown, () );
DO_API( void, il2cpp_set_config_dir, (const char *config_path) );
DO_API( void, il2cpp_set_data_dir, (const char *data_path) );
DO_API( void, il2cpp_set_commandline_arguments, (int argc, const char* argv[], const char* basedir) );
DO_API( void, il2cpp_set_memory_callbacks, (Il2CppMemoryCallbacks* callbacks) );

2019.3.8:

DO_API(int, il2cpp_init, (const char* domain_name));
DO_API(int, il2cpp_init_utf16, (const Il2CppChar * domain_name));
DO_API(void, il2cpp_shutdown, ());
DO_API(void, il2cpp_set_config_dir, (const char *config_path));
DO_API(void, il2cpp_set_data_dir, (const char *data_path));
DO_API(void, il2cpp_set_temp_dir, (const char *temp_path));
DO_API(void, il2cpp_set_commandline_arguments, (int argc, const char* const argv[], const char* basedir));
DO_API(void, il2cpp_set_commandline_arguments_utf16, (int argc, const Il2CppChar * const argv[], const char* basedir));
DO_API(void, il2cpp_set_config_utf16, (const Il2CppChar * executablePath));
DO_API(void, il2cpp_set_config, (const char* executablePath));

DO_API(void, il2cpp_set_memory_callbacks, (Il2CppMemoryCallbacks * callbacks));

DO_API is a macro defined by the including file, il2cpp-api.h:

#define DO_API(r, n, p)             IL2CPP_EXPORT r n p;
#define DO_API_NO_RETURN(r, n, p)   IL2CPP_EXPORT NORETURN r n p;
#include "il2cpp-api-functions.h"
#undef DO_API
#undef DO_API_NORETURN

The really nice thing about DO_API is that we can define our own version which automatically generates exactly the header we want. For example:

#define DO_API(r, n, p) r (*n) p = (r (*) p)(n ## __addr)

This would generate e.g. TypeInfo* (*il2cpp_class_get_interfaces) (TypeInfo *klass, void* *iter) = (TypeInfo* (*) (TypeInfo *klass, void* *iter))(il2cpp_class_get_interfaces__addr);

where il2cpp_class_get_interfaces__addr could be defined as a #define in a header somewhere. Or, to support relocatable base addresses, we could do something like this:

/* in an automatically generated header file */
#define il2cpp_add_internal_call__addr 0x00159180
...

#define DO_API(r, n, p) extern r (*n) p
#include "il2cpp-api-functions.h"
#undef DO_API

void init_il2cpp_api(uintptr_t base_address);

/* in the corresponding C file */
#define DO_API(r, n, p) r (*n) p
#include "il2cpp-api-functions.h"
#undef DO_API

void init_il2cpp_api(uintptr_t base_address) {
#define DO_API(r, n, p) n = (r (*) p)(base_address + n ## __addr)
#include "il2cpp-api-functions.h"
#undef DO_API
}

This would generate a function that would automatically initialize all the function pointers, given the base address to the DLL. No parsing needed!

DannyParker0001 commented 4 years ago

In version 5.6.0f3 it appears to be different. il2cpp-api-functions.h doesnt exist at all... Instead in a file il2cpp-api.h there's lines that look like

    IL2CPP_EXPORT void il2cpp_init (const char* domain_name);
    IL2CPP_EXPORT void il2cpp_shutdown ();
    IL2CPP_EXPORT void il2cpp_set_config_dir (const char *config_path);
    IL2CPP_EXPORT void il2cpp_set_data_dir(const char *data_path);
    IL2CPP_EXPORT void il2cpp_set_commandline_arguments (int argc, const char* argv[], const char* basedir);
    IL2CPP_EXPORT void il2cpp_set_memory_callbacks (Il2CppMemoryCallbacks* callbacks);
    IL2CPP_EXPORT const Il2CppImage* il2cpp_get_corlib ();
    IL2CPP_EXPORT void il2cpp_add_internal_call(const char* name, methodPointerType method);
    IL2CPP_EXPORT methodPointerType il2cpp_resolve_icall(const char* name);
nneonneo commented 4 years ago

@DannyParker0001 Curious, I have the headers for 5.6.0f3 (I have all the headers :P) and I have il2cpp-api-functions.h. I'm looking in the libil2cpp directory from the Unity editor directly.

nneonneo commented 4 years ago

I checked and the code you posted seems to be from 5.2.0f3, which isn't supported by Il2CppInspector anyway.

DannyParker0001 commented 4 years ago

I've just now found the PeNet library (Available on NuGet), it should be able to find all the exports of a Dll. (Better than anything I could make) Using it is as trivial as:

            var peHeader = new PeNet.PeFile("path/to/gameassembly.dll");
            var exports = peHeader.ExportedFunctions;
            foreach(var export in exports)
            {
                var rva = export.Address;
                var name = export.Name;
                // ...
            }
djkaty commented 4 years ago

There is already a PE reader in Il2CppInspector so it should only be a few lines of code to get the exports; besides, I have to consider whether I want to do this for other executable formats although I'm not sure if there is a point to it for anything other than Windows PE files.

vfsfitvnm commented 4 years ago

This is not completely related to this proposal, but I just want to add my two cents. P(re)S: I'm a newbie.

Exports are fundamental if you want to dump il2cpp.so without having global-metadata.dat (sometimes this file is missing), so you are obliged to use the API at runtime to retrieve the information (I've used Frida to do that).

APIs are meant to be universal, so you could completely ignore the unity version at some point. Unfortunately, the exports can't always provide sufficient information about a specific struct (for example, I can't know if a class or method is protected) and do not provide a clean way to do something like Il2CppAssemlby -> Il2CppImage -> Il2CppClass, so you kind of have to do it on your own, which means you have to look at the structs / definitions again.

At the end of the story, if global-metadata.dat is missing (may be SymbolMap-${arch} the key?), everything is a little more complex, because you must use both exports and definitions.

djkaty commented 4 years ago

global-metadata.dat can never be missing as every IL2CPP application absolutely requires the information therein to be able to execute. That's not to say it isn't hidden, obfuscated or embedded somewhere else of course, but that data absolutely must be present somewhere in every IL2CPP app, without question.

I am currently working on an ApplicationModel class/API which will allow you to programmatically retrieve the correlations between .NET and C++ types, together with their definition addresses in the binary and all of the exports as well.

The existing Il2CppModel API does let you find the C++ offset of a field (type.GetField("foo").Offset) within a type, and you can leverage the Reflection-style functions to determine if a field or type is protected etc. already (model.GetType("Foo.Type").IsNestedFamily). You can also obtain all of the type definition addresses etc. from the IDAPython output.

What it doeesn't currently do is map the .NET types to all their corresponding C++ types, definitions, memory locations etc. in the binary, except for methods, in a programmatically queryable way. The forthcoming ApplicationModel API will solve this together with some other shortcomings, including handling exports.

The Unity version can only be ignored if the APIs defined in il2cpp-api-functions.h do absolutely everything you need. Otherwise it must be taken into account (as must the compiler used to build the binary) because the layout of some structs etc. differs between certain versions.

Hope that makes sense!

djkaty commented 4 years ago

Quick follow-up: it seems like it will be really easy to fetch the exports from an ELF file using the code I've already written, so I'll definitely add that.

vfsfitvnm commented 4 years ago

global-metadata.dat can never be missing as every IL2CPP application absolutely requires the information therein to be able to execute. That's not to say it isn't hidden, obfuscated or embedded somewhere else of course, but that data absolutely must be present somewhere in every IL2CPP app, without question.

I've studied Mario Kart Tour (2017.4.3f1). It's not really a problem (I've already achieved what I wanted to), but the only file that could be similar to global-metadata.dat is SymbolMap-${arch}. This is SymbolMap-ARM64, for example.

At this point, a runtime dump is absolutely necessary if we don't want to discover how the metadata has been encrypted, obfuscated and so on. This could be a valid alternative.

The existing Il2CppModel API does let you find the C++ offset of a field (type.GetField("foo").Offset) within a type, and you can leverage the Reflection-style functions to determine if a field or type is protected etc. already (model.GetType("Foo.Type").IsNestedFamily).

This is also (kind of) possible possible with a second dump, if there is no api for that. With the first dump you get the offsets certain methods (System.Reflection.MethodBase get_IsVirtual(), for example), and then you see what's going there with a disassembler. For instance, you discover that method->flags >> 6 & 1 does tell you if a method is virtual or not. This is not ideal, I know, but better than nothing.

You can also obtain all of the type definition addresses etc. from the IDAPython output. What it doeesn't currently do is map the .NET types to all their corresponding C++ types, definitions, memory locations etc. in the binary, except for methods, in a programmatically queryable way. The forthcoming ApplicationModel API will solve this together with some other shortcomings, including handling exports.

This is also very cool! I wish I could do something like this, too.

The Unity version can only be ignored if the APIs defined in il2cpp-api-functions.h do absolutely everything you need. Otherwise it must be taken into account (as must the compiler used to build the binary) because the layout of some structs etc. differs between certain versions.

Absolutely true, in fact building a universal il2cpp dumper that relies on apis is quite trivial. I wanted to do something as complete as possible (like your project), with type definitions, c headers for ghidra, compiling c# code and so on, but I think I'll give up for now :)

Thanks for the interest!"

djkaty commented 4 years ago

Thank you for posting! I'm always interested to hear how people are playing with il2cpp and for possible improvements to the tool. It's not possible to do everything that everybody wants - at least in a timely fashion (time constraints, also I just don't own all of the hardware devices people are hacking games on), but I always bear these conversations in mind. Ghidra support has been asked for and I will look at it when I actually get around to installing Ghidra, but there is an open issue for that ( #42 ).

One of my primary motivations for creating a new IL2CPP tool was that I do a lot of static analysis and wanted to be able to model all of the data in a way that could be queried, rather than just dumping output. I do believe the majority of users simply run the CLI or GUI and produce the output files they want, and well, there is not really any documentation for this project, but the provided APIs are actually a quite powerful tool for people wanting to integrate Il2CppInspector as a library in their own static analysis applications. Having a queryable model also makes developing new output modules much easier. For the documentation problem, I have started a blog series ( https://katyscode.wordpress.com/2020/06/24/il2cpp-part-1/ ) and I'm going to use that to show how to use both the existing Il2CppModel APIs and the forthcoming ApplicationModel APIs to solve practical problems with real-world examples. That should help make the tooling more accessible to developers going forward. Obviously, writing a comprehensive series takes a lot of time.

The SymbolMap is (afaik) not used by the application, and I'm not actually sure it is included with all games, but I've never looked at it so I could be totally wrong about that.

I'm not sure how trivial it is to build an il2cpp-api only dynamic dumper. Assuming the APIs provide you with all the names and symbols that are usually in global-metadata.dat, you can dump everything, but there are many many complications, which you can witness by seeing how complex the type model (Reflection folder) of this project is. Dealing with all the nuances of generic types and methods, overloading, overriding, nested types and so on gets really complicated, so I guess it depends on how accurate you need the dump to be.

Thanks for checking out the project!

vfsfitvnm commented 4 years ago

Ghidra support has been asked for and I will look at it when I actually get around to installing Ghidra, but there is an open issue for that ( #42 ).

The hardest part (at least, for me) is translating C# classes (because it's all I have) into C structs (Ghidra does not support C++ headers at the moment). I know at which offset fields are stored/pointed and what's the instance size, but they don't always match each other. By the way, I have to investigate a little more, but I don't feel I have the competences to do so. Also, namespaces are not a thing in C, and so it gets even more complicated.

One of my primary motivations for creating a new IL2CPP tool was that I do a lot of static analysis and wanted to be able to model all of the data in a way that could be queried, rather than just dumping output. I do believe the majority of users simply run the CLI or GUI and produce the output files they want, and well, there is not really any documentation for this project, but the provided APIs are actually a quite powerful tool for people wanting to integrate Il2CppInspector as a library in their own static analysis applications. Having a queryable model also makes developing new output modules much easier.

Yeah, that's what I wanted to do, too. For the query-able data, I imagined a data repository (which could be something as simple as a json file), structured like a n-ary tree where each node is a namespace or a class (with their fields / methods...):

{
    System: {
         Reflection: {
             ...
         }
     }
         Collections: {
             ...
         }
     }
    Unity: {
        ...
    }
}

This would fit perfectly with a dynamic instrumentation like frida, because I could do something like: callFunction("Game.Application.User.get_NickName")(...) so I wouldn't use offsets anymore. After the game gets updated, I just need to update the json, no need to manually update the offsets.

By the way, if we are talking about static analysis, I can't really imagine a real use case for having queryable data. The best I can imagine is having correct C# source with working jump-to-definition things and so on.

I'm not sure how trivial it is to build an il2cpp-api only dynamic dumper. Assuming the APIs provide you with all the names and symbols that are usually in global-metadata.dat, you can dump everything, but there are many many complications, which you can witness by seeing how complex the type model (Reflection folder) of this project is. Dealing with all the nuances of generic types and methods, overloading, overriding, nested types and so on gets really complicated, so I guess it depends on how accurate you need the dump to be.

A barely acceptable dump can be written with 100 lines of javascript code (no oop). I'm currently rewriting everything in C for performance purposes (there are 15'000+ classes to analyze), and yeah it will require way more lines of code. I don't know how the metadata parsing works, but it's not really difficult to perform a dynamic dump. Once you get the pointer to a Il2CppClass struct, you can get everything the struct (or the apis) can offer: nested types, parent, interfaces... I couldn't figure out how to retrieve few details like protected, internal , override,private` for classes (maybe I should do more a bit of "trial and error" job with the flags).

One thing that I have to mention is that obtaining ALL the inflated classes is trivial. C# generates a brand new class for every class that implements a generic type (other than List<T>, we also have List<int>, List<string>, List<MyClass>... which are all different classes, with different methods offsets (but with the same interface, of course). In order to get all of these, I have to use il2cpp_class_from_type and check if i already dumped that class, for every type, which is expensive. Does this problem occurs too with static dumping?

djkaty commented 4 years ago

First note I'm not at all familiar with Frida (I googled it quickly) so I can't make any specific comments about that, but I'll take it as "dynamic analysis" 🙂

Outputting all the data you described as JSON is basically the perfect use-case for having queryable data in the C# tool, because instead of having to re-write the code to trawl through all the low-level data structures, you can write an output module (Outputs directory) very easily to produce JSON in the format of your choice by just querying the type model:

model = new Il2CppModel(il2cppInspector); // note: going to be renamed to TypeModel in the next couple of days
foreach (var asm in model.Assemblies) {
  foreach (var types in asm.DefinedTypes) {
    // write what you want
  }
}

or for your example which is grouped by namespace:

foreach (var typesInNs in model.Assemblies.SelectMany(x => x.DefinedTypes).GroupBy(t => t.Namespace)) {
  // typesInNs.Key = the namespace
  // typesInNs.Value = all the types defined in the namespace
}

Essentially the point of queryable data is to allow you to shape it into the format of your choosing. For a more esoteric example, I wrote a static analysis tool using Capstone to disassemble ARM code for a particular IL2CPP application and produce the complete set of Google protobuf files for its network protocol. Being able to look up the values loaded via the instruction operands and say, what type is this pointing to? What field or vtable entry is this pointing to? - is extremely useful to re-create the .proto files, and there is no benefit to performing this with dynamic analysis.

For a treatise on why dealing with generics is complicated, check out https://github.com/djkaty/Il2CppInspector/pull/24 where @nneonneo explains it better than I can. While the examples you listed are trivial, when you are dealing with derivation, nesting and multiple generic type parameters together with generic methods which are a whole other kettle of fish, it gets tricky. In addition, IL2CPP implements generic sharing which you can read about here: https://blogs.unity3d.com/2015/06/16/il2cpp-internals-generic-sharing-implementation/

Regarding the flags, alot of the data structures closely (but not entirely) mirror those used by IL itself - the metadata is best described by this book: https://www.amazon.com/Expert-NET-Assembler-Serge-Lidin/dp/1590596463/ref=sr_1_1?dchild=1&keywords=.net+il+2.0&qid=1594245295&sr=8-1

vfsfitvnm commented 4 years ago

is extremely useful to re-create the .proto files, and there is no benefit to performing this with dynamic analysis.

This seems a lot heavy work to be performed in a static analysis imho, especially for the disassemble thing, as it could take hours to complete. With a (early) dynamic instrumentation, you could just hook any of protobuf-net.dll (or any other serializing / deserializing library) functions, read the System.Byte[] / System.IO.Stream and the work is already done!

Regarding the generics (maybe I'm too superficial), I guess any class / field / method which contains a generic type, does not show up in the .so at all. I don't really agree with the sentence "the generic class List will work for any T", because in my opinion the generic class List<T> is not a structure at all. For instance:

class List<T>
{
    void Add(T item); // NULL Il2CppMethodPointer -> no offset -> it does not exist in the `.so`.
}

class List<string>
{
    void Add(string item); // 0x084e530 valid Il2CppMethodPointer!
}

class List<MyType>
{
    void Add(MyType item); // 0x01fa690 valid Il2CppMethodPointer!
}
...

In fact, when I inspect these functions (0x084e530 and 0x01fa690) in ghidra, I can clearly see these two functions are almost identical. So I guess generic types could be completely ignored when defining / creating structs for ida/ghidra, because every inflated type has its own method implementation. Also, every field which is/contains a generic type can't be retrieved in any way because there is no offset for it. This is what I noticed, but I may be completely wrong.

Thanks for the book suggestion!

Lastly, could you share, if possible, the il2cpp-api-functions.h like you did with UnituHeaders? Thanks!

djkaty commented 4 years ago

Briefly as i don't have much time today: the static analysis only takes a few seconds to complete. Hooking it dynamically only gives you the output, it doesn't tell you how the output is formed or what the property names are, you need to examine the structure of the code for that. Dynamic analysis also prevents it from being part of an automated toolchain (or at least makes it somewhat more difficult).

Types and methods with generic type parameters obviously only have concrete implementations in the binary file, the generic definitions are stored in the metadata file. You are correct in that you don't need to generate generic structs for IDA/Ghidra - only concrete implementations - however every possible use of the generics with different type arguments has to be determined and cross-referenced in order to find the function pointers and generate structs for them in the first place. Each has its own TypeInfo/MethodInfo struct and these can only be found by iterating all the type references AFAIK. Each concrete type does not necessarily have its own method implementation - see the afore-mentioned generic sharing article.

il2cpp-api-functions.h is available in any download of Unity; since I intend to automate the creation of DLL injection projects, I am considering including these in future builds of Il2CppInspector.

vfsfitvnm commented 4 years ago

Thanks! Yep I'm way too unexperienced!

I asked for the il2cpp-api-functions.h because my download speed is very low, and every Unity download is > 5 GB, it would take ages. At this point, I'll just download random games from the play store and lookup the exports of their libil2cpp.so.

Thanks for sharing your knowledge :)

djkaty commented 4 years ago

@stefanoPDM which version would you like? I can fix it for you :)

vfsfitvnm commented 4 years ago

@djkaty no problem, I just downloaded a bunch of random games from the play store, it worked out smoothly

djkaty commented 4 years ago

The latest commit of Il2CppInspector will output all of the type definitions in il2cpp-types.h and all of the internal IL2CPP API function export pointers in il2cpp-function-ptr.h.

This should work for PE, ELF and Mach-O files.

For the time being, you will need to combine this with il2cpp-api-functions.h from a Unity install, and make a header with code like this to set it all up:

#ifndef _IL2CPP_TYPES_DEFINED
#define _IL2CPP_TYPES_DEFINED
#include "il2cpp-types.h"
#endif

#include "il2cpp-function-ptr.h"

// IL2CPP APIs
#define DO_API(r, n, p) r (*n) p
#include "il2cpp-api-functions.h"
#undef DO_API

void init_il2cpp() {
    // Get base address of IL2CPP module
    uintptr_t baseAddress = (uintptr_t) GetModuleHandleW(L"GameAssembly.dll");

    // Define IL2CPP API function addresses
    #define DO_API(r, n, p) n = (r (*) p)(baseAddress + n ## _ptr)
    #include "il2cpp-api-functions.h"
    #undef DO_API
}

This is a snippet of a larger project template I'll be integrating into Il2CppInspector over time. Call init_il2cpp() in your DLL injection stub.

Let me know if this works for you guys and if there is anything else needed before closing the issue! :)

Edit: example usage (the TypeInfo items aren't available yet, this will be coming in the solution to #37 ):

   Vector3__Boxed* myVector3 = (Vector3__Boxed*) il2cpp_object_new(Vector3__TypeInfo);

    // (Call a constructor)
    Vector3__ctor(myVector3, 1.0f, 2.0f, 3.0f, NULL);

    // Get y co-ordinate
    float y = myVector3->fields.y;
DannyParker0001 commented 4 years ago

Tested it and it works great! In case your curious, here's what I did: First I changed baseAddress in init_il2cpp to be static, then I only have to initialize it once. The tl;dr explanation of the code is there's a singleton very high up in the class heirachy, so I access the static instance, and worked my way down until I found something interesting yet simple to read. (In this case, a TowerModel[], where each object has a string display field)

// Called from DllMain() when DLL_PROCESS_ATTACH, only if not already called.
void ReadModels() {
    AllocConsole();
    freopen_s((FILE**)stdout, "CONOUT$", "w", stdout);

    std::cout << "Initializing il2cpp\n";
    init_il2cpp();

    std::cout << "Initializing il2cpp_domain\n";

    Il2CppDomain* domain = il2cpp_domain_get();
    size_t size = 0;
    const Il2CppAssembly** assemblies = il2cpp_domain_get_assemblies(domain, &size);

    const Il2CppAssembly* CSharpAssembly = nullptr;
    for (auto i = 0; i < size; ++i) {
        std::cout << assemblies[i]->aname.name << '\n';
        if (std::string(assemblies[i]->aname.name) == "Assembly-CSharp") {
            std::cout << "Found assembly!\n"; 
            CSharpAssembly = assemblies[i];
        }
    }

    if (CSharpAssembly == nullptr) {
        std::cout << "Error! CSharpAssembly not found!\n";
        return;
    }

    Il2CppClass* InGameKlass = il2cpp_class_from_name(CSharpAssembly->image, "Assets.Scripts.Unity.UI_New.InGame", "InGame");

    FieldInfo* instance = il2cpp_class_get_field_from_name(InGameKlass, "instance");

    app::InGame* InGameInstAddr = 0;
    il2cpp_field_static_get_value(instance, &InGameInstAddr);

    if (InGameInstAddr == NULL) {
        std::cout << "Error! You are not In Game!";
        return;
    }

    app::InGame* InGameInst = (app::InGame*)(InGameInstAddr);

    auto arry = InGameInst->fields.bridge->fields.simulation->fields.model->fields.towers;

    app::TowerModel** tmdls = arry->vector;
    uintptr_t max = arry->max_length;

    for (int i = 0; i < max; ++i) {
        auto str = tmdls[i]->fields.display;

        if (str == NULL) {
            continue;
        }

        std::wstring name((wchar_t*)(&str->fields.m_firstChar));
        std::wcout << L"Tower Name:" << name << '\n';
    }
}

Something I noticed whilst writing this was that Visual Studio is not very good at peeking in the ~750K line header, it would consistently crash if I peeked into a second level, and occasionally crash when I close/open the peak window. Not in the scope of this issue though.

djkaty commented 4 years ago

Amazing, that's fantastic!

I have a wall of commits I haven't pushed yet for #37 which should make it easier, but I'm struggling with one last problem, so I'll post on that issue when it's done.

Couple of questions:

  1. This was for Bloons TD 6 on Steam, is that correct?

  2. Isn't baseAddress only initialized once anyway if you only call init_il2cpp() once?

  3. I'm planning to write some tutorials for my IL2CPP series on how to perform various tasks with Il2CppInspector. Would you be willing to allow me to use the example you posted above and dissect it in a tutorial? (I'll credit you of course)

I've made a note about splitting up il2cpp-types.h into smaller files (low priority item).

DannyParker0001 commented 4 years ago
  1. Yes
  2. What I was meant to say I thought it'd be uninitialized once out of scope (and I thought that'd create problems). I've had a closer look at how the pre-processor do-api stuff works now and yes your right, it doesn't need to be static.
  3. Sure I've edited the example to add a null check, I was going to try figure out what initializes the 'singleton' (maybe not singleton, since it gets uninitialized? Still has private static this instance though) to see if it has an isInGame bool or something, but I dont have time for that anymore.
djkaty commented 4 years ago

Alright, thanks on all points :) I guess I better go buy the game then 😂

vfsfitvnm commented 4 years ago

@DannyParker0001 I haven't tested the following couple of things on all unity versions. but:

djkaty commented 4 years ago
const Il2CppAssembly** il2cpp_domain_get_assemblies(const Il2CppDomain* domain, size_t* size)
{
    il2cpp::vm::AssemblyVector* assemblies = Assembly::GetAllAssemblies();
    *size = assemblies->size();
    return &(*assemblies)[0];
}

At least for a good while, the code has looked like this and doesn't seem to use AppDomains at all, as @etclaielfevdidfmsvfsfitvnm wrote 🙂

I'll close this issue since it seems to be resolved and continue the discussion in #37