Read String (linux) - Githubissues

l3lackShark commented 4 years ago

As I saw that you are planning on resuming the work on this project, I would like to see a ReadString() function. Currently I have to read bytes but that's not really an option. Is it possible to implement? Or if not, what is the issue?

Andoryuuta commented 4 years ago

Strings are very ambiguous things unfortunately. Different encodings, null-terminated or not, encoding endianess, etc. Because of that, I originally decided to leave it up to the user of the library via ReadBytes.

However, looking at it now, it would definitely make sense to add something for the most common case of null-terminated utf8 strings (would be able to read from a typical C char*-style string).

Would a ReadNullTerminatedUTF8String() work in your use case?

l3lackShark commented 4 years ago

Even though removing nulls is a step in the right direction, it would probably not satisfy my needs. This becomes a problem when the string size is unknown. Currently I have a very silly function that is not really error-proof. I'm not even sure how I can improve it at this point haha. I'm still kind of new to reading bytes and stuff, let alone to Go in general. Basically I'm trying to get rid off or end on all unexpected/Non-UTF/Special characters. https://pastebin.com/wpyeG1Kx

Andoryuuta commented 4 years ago

Null-terminated doesn't mean to remove the nulls, it means to read bytes one at a time until you hit a null character that signals the end of the string.

In your case however, I believe you are trying to read a UTF16-encoded string with the length stored separately, because of how strings are implemented in .NET core.

I looked up other Osu! projects (as it seems like that is what you are trying to read from), and found some string reading code here. After looking up what Osu! Lazer was made in (C# / .NET core), I was able to pull open the net core source code and look at how strings are layed out in memory (assuming 32bit):

    0x0: void*   virtual_function_base;
    0x4: DWORD   m_StringLength;
    0x8: WCHAR   m_Characters[0];

So to read that, you'll probably have to do something like this:

package main

import (
    "errors"
    "fmt"
    "unicode/utf16"

    "github.com/Andoryuuta/kiwi"
)

// From https://stackoverflow.com/questions/15783830/how-to-read-utf16-text-file-to-string-in-golang
func utf16toString(b []uint8) (string, error) {
    if len(b)&1 != 0 {
        return "", errors.New("len(b) must be even")
    }

    // Check BOM
    var bom int
    if len(b) >= 2 {
        switch n := int(b[0])<<8 | int(b[1]); n {
        case 0xfffe:
            bom = 1
            fallthrough
        case 0xfeff:
            b = b[2:]
        }
    }

    w := make([]uint16, len(b)/2)
    for i := range w {
        w[i] = uint16(b[2*i+bom&1])<<8 | uint16(b[2*i+(bom+1)&1])
    }
    return string(utf16.Decode(w)), nil
}

func ReadNetCoreString(proc kiwi.Process, addr uintptr) (string, error) {
    strLen, err := proc.ReadUint32(addr + 0x4)
    if err != nil {
        return "", nil
    }

    rawData, err := proc.ReadBytes(addr+0x8, strLen*2) // Multiplied by 2 because it's a UTF16 string.
    if err != nil {
        return "", nil
    }

    return utf16toString(rawData)
}

func main() {
    proc, err := kiwi.GetProcessByFileName("...")
    if err != nil {
        panic(err)
    }

    stringAddr := uintptr(0xFFFFFFFF) // Need to get the address of the net StringObject.
    str, err := ReadNetCoreString(proc, stringAddr)
    if err != nil {
        panic(err)
    }

    fmt.Println(str)
}

l3lackShark commented 4 years ago

Wow, thanks for taking the time on my case. I tried to apply these functions, but I get bunch of Chinese characters in return by default. By looking into the structure of the strings I couldn't pick up any values that would represent strLength However, ChetEngine is able to figure it out by itself (100 is a manual value, actual string is less than this number)

Here is how it looks like in memory viewer:

l3lackShark commented 4 years ago

But you are right, the text is indeed in UTF-16. By changing the Display type to UTF-16 in CE, the strings look normal: And here is addr+0x4:

l3lackShark commented 4 years ago

Wait, I was wrong, working on it..

gitzec commented 4 years ago

this string is terminated by zeros... use a pointer to get byte 1 and read until you get the zeros. happy game hacking ;-)

l3lackShark commented 4 years ago

0x0: void*   virtual_function_base;
0x4: DWORD   m_StringLength;
0x8: WCHAR   m_Characters[0];

This was so valuable to me, thank you very much.

https://www.youtube.com/watch?v=JmLH1r0KMms

Andoryuuta commented 4 years ago

@l3lackShark No worries, glad it works!

In response the the original issue post though, I'll probably add ReadNullTerminatedUTF8String() and ReadNullTerminatedUTF16String() functions to kiwi for the most common use cases, the latter would have resolved this issue from the beginning. I'll open a separate issue for this.

To add some more information about your specific case though, @zecman is right about it being null-terminated as well. If you look at the managed-code portion of the net core string implementation, it says:

For empty strings, _firstChar will be '\0', since strings are both null-terminated and length-prefixed.

Which is why cheat engine is able to find the end of the string. A different way you could read the string by using a null-terminator would be something like:

func ReadNullTerminatedUTF16String(proc kiwi.Process, addr uintptr) (string, error) {
    var rawData []uint16
    for {
        // Read a single uint16
        c, err := proc.ReadUint16(addr + uintptr(len(rawData)))
        if err != nil {
            return "", err
        }

        // Check if the uint16 is 0 (null terminator).
        if c != 0 {
            // Not zero, append it to our slice.
            rawData = append(rawData, c)
        } else {
            // Got null terminator, exit loop
            break
        }
    }

    // Decode the UTF16 into a Go string type.
    return string(utf16.Decode(rawData)), nil
}

func main() {
    strDataAddr := uintptr(0x......) // Address of the `m_Characters` field.
    str, err := ReadNullTerminatedUTF16String(proc, strDataAddr)
    if err != nil {
        panic(err)
    }
    fmt.Println(str)
}

Do note: this way of reading it will be significantly slower than the other method because it's reading 1 uint16 per call, instead of the full string data in 1 call.

gitzec commented 4 years ago

Maybe you could read the mem in blocks and iterate through these to save syscalls? Thanks for sharing this base.

Andoryuuta commented 4 years ago

Yes, reading in 2048-byte blocks in how I implemented it on the V0.2 branch, (decreasing the block size if there is an error, so that it will still work if the string is on the end of a allocated memory region). I haven't merged the it into master yet because I haven't had the time to setup a ubuntu VM to test the changes on.

l3lackShark commented 4 years ago

@Andoryuuta I could confirm that ReadNullTerminatedUTF16 works like a charm, I switched to it from my solution. Let me know if you want to test something else.

Andoryuuta commented 4 years ago

Thanks for testing it! I'll merge it to master now.

Andoryuuta / kiwi

Read String (linux) #6