does it work with wchar_t?

roozbehid-ic commented 3 years ago

I am getting an error with wchar_t. I also included #include to bypass clang error but I do get following

PANIC: [Panic] String cannot be of zero length. (Parameter 'oldValue')
   at System.String.Replace(String oldValue, String newValue)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.Location(CXCursor cursor, Nullable`1 type)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.VisitTranslationUnit(CXTranslationUnit translationUnit)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.AbstractSyntaxTree(CXTranslationUnit translationUnit, Int32 bitness)
   at C2CS.UseCases.AbstractSyntaxTreeC.UseCase.Explore(CXTranslationUnit translationUnit, ImmutableArray`1 includeDirectories, ImmutableArray`1 ignoredFiles, ImmutableArray`1 opaqueTypes, ImmutableArray`1 whitelistFunctionNames, Int32 bitness)
   at C2CS.UseCases.AbstractSyntaxTreeC.UseCase.Execute(Request request, Response response)
   at C2CS.UseCase`2.Execute(TRequest request)

lithiumtoast commented 3 years ago

Do you have a header / repo I could use to reproduce?

lithiumtoast commented 3 years ago

@roozbehid-ic But yes, there is support for wchar_t. It's likely you have just hit a corner case that wasn't accounted for.

roozbehid-ic commented 3 years ago

This is the header

#include <stddef.h>
#define WINAPI
#define NSCUBE_EXPORTS
#define NSCUBE_API __declspec(dllexport)

typedef unsigned long   nscCHANID;
typedef void *          nscHANDLE;
typedef void *          nscHSRV;
typedef int nscRESULT;

NSCUBE_API nscRESULT WINAPI nscAddTextW
(
    nscHANDLE hTTS,
    const wchar_t *pwszInputTxt,
    void *pUserText
);

and how I run it and what I get

C2CS.exe ast -i nscube.h -o nscube.json
AbstractSyntaxTreeC: Started.
        Started step (1/3) 'Parse C code from disk'
libclang: Parsing 'nscube.h' with the following arguments...
        --language=c --std=c11 -Wno-pragma-once-outside-header -fno-blocks -m64 --include-directory= -isystemC:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt -isystemC:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include
        Finished step (1/3) 'Parse C code from disk' in 278.1678 ms
        Started step (2/3) 'Extract C abstract syntax tree'
AbstractSyntaxTreeC: Finished unsuccessfully. Review the following diagnostics for reason(s) why:
PANIC: [Panic] String cannot be of zero length. (Parameter 'oldValue')
   at System.String.Replace(String oldValue, String newValue)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.Location(CXCursor cursor, Nullable`1 type)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.VisitTranslationUnit(CXTranslationUnit translationUnit)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.AbstractSyntaxTree(CXTranslationUnit translationUnit, Int32 bitness)
   at C2CS.UseCases.AbstractSyntaxTreeC.UseCase.Explore(CXTranslationUnit translationUnit, ImmutableArray`1 includeDirectories, ImmutableArray`1 ignoredFiles, ImmutableArray`1 opaqueTypes, ImmutableArray`1 whitelistFunctionNames, Int32 bitness)
   at C2CS.UseCases.AbstractSyntaxTreeC.UseCase.Execute(Request request, Response response)
   at C2CS.UseCase`2.Execute(TRequest request)

lithiumtoast commented 3 years ago

Issue was the directory was not specified for input and output. I always used it like this before:

C2CS.exe ast -i .\nscube.h -o .\nscube.json

I fixed it where if the directory is not specified then the current directory would be used.

Let me know if you have any other issues or things are still persisting.

roozbehid-ic commented 3 years ago

Thanks I got past that one but getting a new error.

c:\Utils\C2CS.exe  ast -i .\nscube.h -o .\nscube.json
AbstractSyntaxTreeC: Started.
        Started step (1/3) 'Parse C code from disk'
libclang: Parsing '.\nscube.h' with the following arguments...
        --language=c --std=c11 -Wno-pragma-once-outside-header -fno-blocks -m64 --include-directory=. -isystemC:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt -isystemC:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include
        Finished step (1/3) 'Parse C code from disk' in 236.3814 ms
        Started step (2/3) 'Extract C abstract syntax tree'
AbstractSyntaxTreeC: Finished unsuccessfully. Review the following diagnostics for reason(s) why:
PANIC: [Panic] Type name was not found for 'FunctionPointer'.
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.TypeName(String parentTypeName, CKind kind, CXType type, CXCursor cursor)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.CreateFunctionPointer(String typeName, CXCursor cursor, Node parentNode, CXType originalType, CXType type, ClangLocation location)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.ExploreFunctionPointer(String typeName, CXCursor cursor, CXType type, CXType originalType, ClangLocation location, Node parentNode)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.ExploreTypedef(Node node, Node parentNode)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.ExploreNode(Node node)
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.Explore()
   at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.AbstractSyntaxTree(CXTranslationUnit translationUnit, Int32 bitness)
   at C2CS.UseCases.AbstractSyntaxTreeC.UseCase.Explore(CXTranslationUnit translationUnit, ImmutableArray`1 includeDirectories, ImmutableArray`1 ignoredFiles, ImmutableArray`1 opaqueTypes, ImmutableArray`1 whitelistFunctionNames, Int32 bitness)
   at C2CS.UseCases.AbstractSyntaxTreeC.UseCase.Execute(Request request, Response response)
   at C2CS.UseCase`2.Execute(TRequest request)

You can see the header file here. https://gist.github.com/roozbehid-ic/03197d9db0735454ef893132c5b30dd6

lithiumtoast commented 3 years ago

Problem was:

typedef int PNSC_FNSPEECH_DATA (const unsigned char *pData, unsigned int cbDataSize, PNSC_SOUND_DATA pSoundData, void *pAppInstanceData);

This is the first time I have encountered a function proto inlined as a typedef in this fashion. Usually I encounter function protos in this fashion:

typedef SDL_HitTestResult (SDLCALL *SDL_HitTest)(SDL_Window *win, const SDL_Point *area, void *data);

Anyways just a small couple line fix in logic: d1c74f023165b0cd8f9eff7a64f8f48cde399f39

Note that the API in question looks to be quite intensive on strings. Please advise that casting CString8U to string calls Runtime.CString, to which that function hashes the string if the pointer representing the string (often the case of pointer to first element) is not in the cache. While djb2 is extremely fast, this is technically a O(n) where n is the length of the string; I have not found a more elegant way to deal with strings. Ideas and feedback is much appreciated.

roozbehid-ic commented 3 years ago

Thank you so much for all the help. Two things:

Would you mind telling how you discovered what went wrong. At least it helps future people to know how to diagnose the problems. Also it seems you are using special classes for unicode, etc. Why not use marshaling that comes with C#? It seems you are targeting .net 5 then it should also have utf8, utf16 and all those marshaling and looks more native to c#.

lithiumtoast commented 3 years ago

Why not use marshaling that comes with C#?

Last time I looked into it it it was implementation details: each C string (char*) when converted to a C# string (string) would be a new allocation on the GC heap which is not ideal for my use cases. I do it manually so I can have the opportunity to cache the strings using a hashing algorithm or do something else entirely in the future. It also ensures that the bindings are exactly 1-1 in terms of bit layout.

There is a whole GitHub issue on .NET about this problem: https://github.com/dotnet/corefxlab/issues/2350

lithiumtoast commented 3 years ago

Would you mind telling how you discovered what went wrong.

I just run the program with the debugger enabled and observe the stack trace. I then unravel the stack and observe the state of variables and other memory which cause the program to enter the particular state. The only magical pieces are libclang and Rosyln; the rest is 100% just ordinary C# code. However libclang can be a little hard to debug sometimes and I find myself entering throw away code to get the name of a type / cursor in some cases to help with debugging.

The tests are mostly done via smoke tests which I use the bindings generated by C2CS in my own libraries. I know immediately if something doesn't work right because I use the bindings all the time.

roozbehid-ic commented 3 years ago

Makes sense. I use following for char* which is potentially utf8 to string (although C# comes with ascii char* to string marshaller)

    public class UTF8Marshaler : ICustomMarshaler
    {
        static UTF8Marshaler static_instance;

        public IntPtr MarshalManagedToNative(object managedObj)
        {
            if (managedObj == null)
                return IntPtr.Zero;
            if (!(managedObj is string))
                throw new MarshalDirectiveException(
                       "UTF8Marshaler must be used on a string.");

            // not null terminated
            byte[] strbuf = Encoding.UTF8.GetBytes((string)managedObj);
            IntPtr buffer = Marshal.AllocHGlobal(strbuf.Length + 1);
            Marshal.Copy(strbuf, 0, buffer, strbuf.Length);

            // write the terminating null
            Marshal.WriteByte(buffer + strbuf.Length, 0);
            return buffer;
        }

        public unsafe object MarshalNativeToManaged(IntPtr pNativeData)
        {
            byte* walk = (byte*)pNativeData;

            // find the end of the string
            while (*walk != 0)
            {
                walk++;
            }
            int length = (int)(walk - (byte*)pNativeData);

            // should not be null terminated
            byte[] strbuf = new byte[length];
            // skip the trailing null
            Marshal.Copy((IntPtr)pNativeData, strbuf, 0, length);
            string data = Encoding.UTF8.GetString(strbuf);
            return data;
        }

        public void CleanUpNativeData(IntPtr pNativeData)
        {
            Marshal.FreeHGlobal(pNativeData);
        }

        public void CleanUpManagedData(object managedObj)
        {
        }

        public int GetNativeDataSize()
        {
            return -1;
        }

        public static ICustomMarshaler GetInstance(string cookie)
        {
            if (static_instance == null)
            {
                return static_instance = new UTF8Marshaler();
            }
            return static_instance;
        }
    }

Which probably ends up being similar to what you do. But on delegates and dllimport it looks more natural and you end up using string class.

        public delegate int ActionReply(Int64 Id, [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(UTF8Marshaler))] string nameValueParameters);

or

        [DllImport("mycode.dll",  EntryPoint = "channelIssueActionRequest")]
        public static extern int ChannelIssueActionRequest(Int64 Id,int requestId, [MarshalAs(UnmanagedType.LPStr)] string actionTypeName, [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef=typeof(UTF8Marshaler))] string nameValueParameters);

in This one actionTypeName is from ascii char* to string, and nameValueParameters is from char* utf8 to string So technically in this approach you always have string in C# which might not be desirable for many cases as you lose original data. But also totally makes sense for many other cases.

roozbehid-ic commented 3 years ago

I am wondering why are you caching strings? Could this be used instead https://docs.microsoft.com/en-us/dotnet/api/system.string.intern?view=net-5.0 ?

lithiumtoast commented 3 years ago

Consider the case where a C library uses C strings (char*) for identifiers that are most likely less than 25 characters in length. E.g. "PlayerEntity", "WeaponComponent", "RenderingSystem", etc. These strings are used frequently and usually out of context to the global C functions as part of the API to do something significant such as getting an entity or updating a component, etc, and then returning a callback. I do not want the garbage collector in C# to allocate memory in heap 0 for these strings from the callback. Instead I rather run a quick hash over the C string and use a hashtable to know if that C string's contents has already has an analogous memory allocated in the GC and use that reference instead.

System.String.Intern(string str) would work if it accepted a raw pointer to a C style string. Otherwise, it accepts a string as input but that string instance still has to be allocated first which is not great. Also, according to the docs the memory would be retained until the application exits or restarts which is not great either.

lithiumtoast commented 3 years ago

I use following for char which is potentially utf8 to string (although C# comes with ascii char to string marshaller)

This is interesting, I'll have to try it out sometime and see if it makes more sense. Right now implicit casting makes CString8U and CString16U "behave" and be "accepted" into functions and methods which use string. I do the same technique for CBool and bool.

roozbehid-ic commented 3 years ago

Also FYI, I pasted the code to show custom marshallers could potentially be used (although I don't think they can be used in unmanaged function pointers) New C# already has [MarshalAs(UnmanagedType.LPUTF8Str)] which can convert char* as utf8 to string

What do you think about a special mode of c2cs program that would generate managed C# code and also uses these marshaling?

lithiumtoast commented 3 years ago

a special mode

Sure, it could be an option that starts at the cs program level:

roozbehid-ic commented 3 years ago

I can give it a try and see if I can send a PR for you. Thanks

lithiumtoast commented 3 years ago

Just a note of warning, I have StyleCop ruleset turned on for this project. You can disable it by adding the following to the .csproj in question:

<EnableAnalyzers>false</EnableAnalyzers>

lithiumtoast commented 2 years ago

Closing as the original problem was resolved

lithiumtoast commented 2 years ago

@roozbehid-ic I have just encountered a grave mistake on my part. wchar_t is 2 bytes on Windows but actually 4 bytes on Linux. I'll need to re-think this.

lithiumtoast commented 2 years ago

I'm going to close this actually a create a new issue and link the related issues.

bottlenoselabs / c2cs

does it work with wchar_t? #39