bottlenoselabs / c2cs

Generate C# bindings from a C header.
MIT License
245 stars 18 forks source link

Bug if native strings are moved in memory #110

Closed Louis-Simon22 closed 1 year ago

Louis-Simon22 commented 1 year ago

Hello,

Using the following code in the Runtime.CString, if the native strings are moved in memory, but their values are not changed, then it either complains about duplicate values in PointersToStrings or it points to garbage data.

   public static string String(CString value)
    {
        if (value.IsNull)
        {
            return string.Empty;
        }

        if (PointersToStrings.TryGetValue(value._pointer, out var result))
        {
            return result;
        }

        var hash = Djb2((byte*)value._pointer);
        if (StringHashesToPointers.TryGetValue(hash, out var pointer2))
        {
            result = PointersToStrings[pointer2._pointer];
            return result;
        }

        // calls ASM/C/C++ functions to calculate length and then "FastAllocate" the string with the GC
        // https://mattwarren.org/2016/05/31/Strings-and-the-CLR-a-Special-Relationship/
        result = Marshal.PtrToStringAnsi(value._pointer);

        if (string.IsNullOrEmpty(result))
        {
            return string.Empty;
        }

        StringHashesToPointers.Add(hash, value);
        PointersToStrings.Add(value._pointer, result);

        return result;
    }

If I manually disable the cache, the problems go away. I think there should be a way to disable this caching and a warning about its requirements in the documentation.

lithiumtoast commented 1 year ago

Thanks for the bug report! I encountered this myself before when working with libclang. If you wish to have caching still the workaround is to call FreeString or one the similar functions.

In the next version of C2CS I will add an option to disable string caching.

Would Marshal.PtrToStringAnsi be called every time then?

Louis-Simon22 commented 1 year ago

I think calling every time is kind of unavoidable if the native strings can be moved

lithiumtoast commented 1 year ago

Okay. I'm concerned about the implicit/explicit operator causes allocation of memory; perhaps I'll remove them if caching can be disabled. What do you think?

Louis-Simon22 commented 1 year ago

Explicit operators doing allocation does not bother me, but implicit can be more tricky because people might not be aware or know it happens. I didn't even think that there was a cache until I encountered this bug.

We might be able to keep the cache if the key is the pointer and the hash, but it might not be worth it performance-wise.

lithiumtoast commented 1 year ago

Discussion on flecs discord, in the next version of C2CS:

Louis-Simon22 commented 1 year ago

Sounds good, thank you.

If anyone encounters this in the current version, you can manually edit the Runtime.CStrings.String function for something like this:

public static string String(CString value)
{
    if (value.IsNull)
    {
        return string.Empty;
    }

    // calls ASM/C/C++ functions to calculate length and then "FastAllocate" the string with the GC
    // https://mattwarren.org/2016/05/31/Strings-and-the-CLR-a-Special-Relationship/
    var result = Marshal.PtrToStringAnsi(value._pointer);

    if (string.IsNullOrEmpty(result))
    {
        return string.Empty;
    }

    return result;
}

Remember to do the same for the StringWide if needed

lithiumtoast commented 1 year ago

fixed and released under v4.0.2