Closed l3lackShark closed 4 years ago
Strings are very ambiguous things unfortunately. Different encodings, null-terminated or not, encoding endianess, etc. Because of that, I originally decided to leave it up to the user of the library via ReadBytes
.
However, looking at it now, it would definitely make sense to add something for the most common case of null-terminated utf8 strings (would be able to read from a typical C char*
-style string).
Would a ReadNullTerminatedUTF8String()
work in your use case?
Even though removing nulls is a step in the right direction, it would probably not satisfy my needs. This becomes a problem when the string size is unknown. Currently I have a very silly function that is not really error-proof. I'm not even sure how I can improve it at this point haha. I'm still kind of new to reading bytes and stuff, let alone to Go in general. Basically I'm trying to get rid off or end on all unexpected/Non-UTF/Special characters. https://pastebin.com/wpyeG1Kx
Null-terminated doesn't mean to remove the nulls, it means to read bytes one at a time until you hit a null character that signals the end of the string.
In your case however, I believe you are trying to read a UTF16-encoded string with the length stored separately, because of how strings are implemented in .NET core.
I looked up other Osu!
projects (as it seems like that is what you are trying to read from), and found some string reading code here. After looking up what Osu! Lazer
was made in (C# / .NET core), I was able to pull open the net core source code and look at how strings are layed out in memory (assuming 32bit):
0x0: void* virtual_function_base;
0x4: DWORD m_StringLength;
0x8: WCHAR m_Characters[0];
So to read that, you'll probably have to do something like this:
package main
import (
"errors"
"fmt"
"unicode/utf16"
"github.com/Andoryuuta/kiwi"
)
// From https://stackoverflow.com/questions/15783830/how-to-read-utf16-text-file-to-string-in-golang
func utf16toString(b []uint8) (string, error) {
if len(b)&1 != 0 {
return "", errors.New("len(b) must be even")
}
// Check BOM
var bom int
if len(b) >= 2 {
switch n := int(b[0])<<8 | int(b[1]); n {
case 0xfffe:
bom = 1
fallthrough
case 0xfeff:
b = b[2:]
}
}
w := make([]uint16, len(b)/2)
for i := range w {
w[i] = uint16(b[2*i+bom&1])<<8 | uint16(b[2*i+(bom+1)&1])
}
return string(utf16.Decode(w)), nil
}
func ReadNetCoreString(proc kiwi.Process, addr uintptr) (string, error) {
strLen, err := proc.ReadUint32(addr + 0x4)
if err != nil {
return "", nil
}
rawData, err := proc.ReadBytes(addr+0x8, strLen*2) // Multiplied by 2 because it's a UTF16 string.
if err != nil {
return "", nil
}
return utf16toString(rawData)
}
func main() {
proc, err := kiwi.GetProcessByFileName("...")
if err != nil {
panic(err)
}
stringAddr := uintptr(0xFFFFFFFF) // Need to get the address of the net StringObject.
str, err := ReadNetCoreString(proc, stringAddr)
if err != nil {
panic(err)
}
fmt.Println(str)
}
Wow, thanks for taking the time on my case. I tried to apply these functions, but I get bunch of Chinese characters in return by default. By looking into the structure of the strings I couldn't pick up any values that would represent strLength However, ChetEngine is able to figure it out by itself (100 is a manual value, actual string is less than this number)
Here is how it looks like in memory viewer:
But you are right, the text is indeed in UTF-16. By changing the Display type to UTF-16 in CE, the strings look normal: And here is addr+0x4:
Wait, I was wrong, working on it..
this string is terminated by zeros... use a pointer to get byte 1 and read until you get the zeros. happy game hacking ;-)
0x0: void* virtual_function_base; 0x4: DWORD m_StringLength; 0x8: WCHAR m_Characters[0];
This was so valuable to me, thank you very much.
@l3lackShark No worries, glad it works!
In response the the original issue post though, I'll probably add ReadNullTerminatedUTF8String()
and ReadNullTerminatedUTF16String()
functions to kiwi for the most common use cases, the latter would have resolved this issue from the beginning. I'll open a separate issue for this.
To add some more information about your specific case though, @zecman is right about it being null-terminated as well. If you look at the managed-code portion of the net core string implementation, it says:
For empty strings, _firstChar will be '\0', since strings are both null-terminated and length-prefixed.
Which is why cheat engine is able to find the end of the string. A different way you could read the string by using a null-terminator would be something like:
func ReadNullTerminatedUTF16String(proc kiwi.Process, addr uintptr) (string, error) {
var rawData []uint16
for {
// Read a single uint16
c, err := proc.ReadUint16(addr + uintptr(len(rawData)))
if err != nil {
return "", err
}
// Check if the uint16 is 0 (null terminator).
if c != 0 {
// Not zero, append it to our slice.
rawData = append(rawData, c)
} else {
// Got null terminator, exit loop
break
}
}
// Decode the UTF16 into a Go string type.
return string(utf16.Decode(rawData)), nil
}
func main() {
strDataAddr := uintptr(0x......) // Address of the `m_Characters` field.
str, err := ReadNullTerminatedUTF16String(proc, strDataAddr)
if err != nil {
panic(err)
}
fmt.Println(str)
}
Do note: this way of reading it will be significantly slower than the other method because it's reading 1 uint16 per call, instead of the full string data in 1 call.
Maybe you could read the mem in blocks and iterate through these to save syscalls? Thanks for sharing this base.
Yes, reading in 2048-byte blocks in how I implemented it on the V0.2
branch, (decreasing the block size if there is an error, so that it will still work if the string is on the end of a allocated memory region). I haven't merged the it into master yet because I haven't had the time to setup a ubuntu VM to test the changes on.
@Andoryuuta I could confirm that ReadNullTerminatedUTF16 works like a charm, I switched to it from my solution. Let me know if you want to test something else.
Thanks for testing it! I'll merge it to master now.
As I saw that you are planning on resuming the work on this project, I would like to see a ReadString() function. Currently I have to read bytes but that's not really an option. Is it possible to implement? Or if not, what is the issue?