Closed roozbehid-ic closed 2 years ago
Do you have a header / repo I could use to reproduce?
@roozbehid-ic But yes, there is support for wchar_t
. It's likely you have just hit a corner case that wasn't accounted for.
This is the header
#include <stddef.h>
#define WINAPI
#define NSCUBE_EXPORTS
#define NSCUBE_API __declspec(dllexport)
typedef unsigned long nscCHANID;
typedef void * nscHANDLE;
typedef void * nscHSRV;
typedef int nscRESULT;
NSCUBE_API nscRESULT WINAPI nscAddTextW
(
nscHANDLE hTTS,
const wchar_t *pwszInputTxt,
void *pUserText
);
and how I run it and what I get
C2CS.exe ast -i nscube.h -o nscube.json
AbstractSyntaxTreeC: Started.
Started step (1/3) 'Parse C code from disk'
libclang: Parsing 'nscube.h' with the following arguments...
--language=c --std=c11 -Wno-pragma-once-outside-header -fno-blocks -m64 --include-directory= -isystemC:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt -isystemC:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include
Finished step (1/3) 'Parse C code from disk' in 278.1678 ms
Started step (2/3) 'Extract C abstract syntax tree'
AbstractSyntaxTreeC: Finished unsuccessfully. Review the following diagnostics for reason(s) why:
PANIC: [Panic] String cannot be of zero length. (Parameter 'oldValue')
at System.String.Replace(String oldValue, String newValue)
at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.Location(CXCursor cursor, Nullable`1 type)
at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.VisitTranslationUnit(CXTranslationUnit translationUnit)
at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.AbstractSyntaxTree(CXTranslationUnit translationUnit, Int32 bitness)
at C2CS.UseCases.AbstractSyntaxTreeC.UseCase.Explore(CXTranslationUnit translationUnit, ImmutableArray`1 includeDirectories, ImmutableArray`1 ignoredFiles, ImmutableArray`1 opaqueTypes, ImmutableArray`1 whitelistFunctionNames, Int32 bitness)
at C2CS.UseCases.AbstractSyntaxTreeC.UseCase.Execute(Request request, Response response)
at C2CS.UseCase`2.Execute(TRequest request)
Issue was the directory was not specified for input and output. I always used it like this before:
C2CS.exe ast -i .\nscube.h -o .\nscube.json
I fixed it where if the directory is not specified then the current directory would be used.
Let me know if you have any other issues or things are still persisting.
Thanks I got past that one but getting a new error.
c:\Utils\C2CS.exe ast -i .\nscube.h -o .\nscube.json
AbstractSyntaxTreeC: Started.
Started step (1/3) 'Parse C code from disk'
libclang: Parsing '.\nscube.h' with the following arguments...
--language=c --std=c11 -Wno-pragma-once-outside-header -fno-blocks -m64 --include-directory=. -isystemC:\Program Files (x86)\Windows Kits\10\Include\10.0.17763.0\ucrt -isystemC:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\include
Finished step (1/3) 'Parse C code from disk' in 236.3814 ms
Started step (2/3) 'Extract C abstract syntax tree'
AbstractSyntaxTreeC: Finished unsuccessfully. Review the following diagnostics for reason(s) why:
PANIC: [Panic] Type name was not found for 'FunctionPointer'.
at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.TypeName(String parentTypeName, CKind kind, CXType type, CXCursor cursor)
at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.CreateFunctionPointer(String typeName, CXCursor cursor, Node parentNode, CXType originalType, CXType type, ClangLocation location)
at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.ExploreFunctionPointer(String typeName, CXCursor cursor, CXType type, CXType originalType, ClangLocation location, Node parentNode)
at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.ExploreTypedef(Node node, Node parentNode)
at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.ExploreNode(Node node)
at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.Explore()
at C2CS.UseCases.AbstractSyntaxTreeC.ClangTranslationUnitExplorer.AbstractSyntaxTree(CXTranslationUnit translationUnit, Int32 bitness)
at C2CS.UseCases.AbstractSyntaxTreeC.UseCase.Explore(CXTranslationUnit translationUnit, ImmutableArray`1 includeDirectories, ImmutableArray`1 ignoredFiles, ImmutableArray`1 opaqueTypes, ImmutableArray`1 whitelistFunctionNames, Int32 bitness)
at C2CS.UseCases.AbstractSyntaxTreeC.UseCase.Execute(Request request, Response response)
at C2CS.UseCase`2.Execute(TRequest request)
You can see the header file here. https://gist.github.com/roozbehid-ic/03197d9db0735454ef893132c5b30dd6
Problem was:
typedef int PNSC_FNSPEECH_DATA (const unsigned char *pData, unsigned int cbDataSize, PNSC_SOUND_DATA pSoundData, void *pAppInstanceData);
This is the first time I have encountered a function proto inlined as a typedef in this fashion. Usually I encounter function protos in this fashion:
typedef SDL_HitTestResult (SDLCALL *SDL_HitTest)(SDL_Window *win, const SDL_Point *area, void *data);
Anyways just a small couple line fix in logic: d1c74f023165b0cd8f9eff7a64f8f48cde399f39
Note that the API in question looks to be quite intensive on strings. Please advise that casting CString8U
to string
calls Runtime.CString
, to which that function hashes the string if the pointer representing the string (often the case of pointer to first element) is not in the cache. While djb2
is extremely fast, this is technically a O(n)
where n
is the length of the string; I have not found a more elegant way to deal with strings. Ideas and feedback is much appreciated.
Thank you so much for all the help. Two things:
Would you mind telling how you discovered what went wrong. At least it helps future people to know how to diagnose the problems. Also it seems you are using special classes for unicode, etc. Why not use marshaling that comes with C#? It seems you are targeting .net 5 then it should also have utf8, utf16 and all those marshaling and looks more native to c#.
Why not use marshaling that comes with C#?
Last time I looked into it it it was implementation details: each C string (char*
) when converted to a C# string (string
) would be a new allocation on the GC heap which is not ideal for my use cases. I do it manually so I can have the opportunity to cache the strings using a hashing algorithm or do something else entirely in the future. It also ensures that the bindings are exactly 1-1 in terms of bit layout.
There is a whole GitHub issue on .NET about this problem: https://github.com/dotnet/corefxlab/issues/2350
Would you mind telling how you discovered what went wrong.
I just run the program with the debugger enabled and observe the stack trace. I then unravel the stack and observe the state of variables and other memory which cause the program to enter the particular state. The only magical pieces are libclang
and Rosyln; the rest is 100% just ordinary C# code. However libclang
can be a little hard to debug sometimes and I find myself entering throw away code to get the name of a type / cursor in some cases to help with debugging.
The tests are mostly done via smoke tests which I use the bindings generated by C2CS
in my own libraries. I know immediately if something doesn't work right because I use the bindings all the time.
Makes sense.
I use following for char*
which is potentially utf8 to string
(although C# comes with ascii char*
to string
marshaller)
public class UTF8Marshaler : ICustomMarshaler
{
static UTF8Marshaler static_instance;
public IntPtr MarshalManagedToNative(object managedObj)
{
if (managedObj == null)
return IntPtr.Zero;
if (!(managedObj is string))
throw new MarshalDirectiveException(
"UTF8Marshaler must be used on a string.");
// not null terminated
byte[] strbuf = Encoding.UTF8.GetBytes((string)managedObj);
IntPtr buffer = Marshal.AllocHGlobal(strbuf.Length + 1);
Marshal.Copy(strbuf, 0, buffer, strbuf.Length);
// write the terminating null
Marshal.WriteByte(buffer + strbuf.Length, 0);
return buffer;
}
public unsafe object MarshalNativeToManaged(IntPtr pNativeData)
{
byte* walk = (byte*)pNativeData;
// find the end of the string
while (*walk != 0)
{
walk++;
}
int length = (int)(walk - (byte*)pNativeData);
// should not be null terminated
byte[] strbuf = new byte[length];
// skip the trailing null
Marshal.Copy((IntPtr)pNativeData, strbuf, 0, length);
string data = Encoding.UTF8.GetString(strbuf);
return data;
}
public void CleanUpNativeData(IntPtr pNativeData)
{
Marshal.FreeHGlobal(pNativeData);
}
public void CleanUpManagedData(object managedObj)
{
}
public int GetNativeDataSize()
{
return -1;
}
public static ICustomMarshaler GetInstance(string cookie)
{
if (static_instance == null)
{
return static_instance = new UTF8Marshaler();
}
return static_instance;
}
}
Which probably ends up being similar to what you do.
But on delegates and dllimport it looks more natural and you end up using string
class.
public delegate int ActionReply(Int64 Id, [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef = typeof(UTF8Marshaler))] string nameValueParameters);
or
[DllImport("mycode.dll", EntryPoint = "channelIssueActionRequest")]
public static extern int ChannelIssueActionRequest(Int64 Id,int requestId, [MarshalAs(UnmanagedType.LPStr)] string actionTypeName, [MarshalAs(UnmanagedType.CustomMarshaler, MarshalTypeRef=typeof(UTF8Marshaler))] string nameValueParameters);
in This one actionTypeName
is from ascii char*
to string
, and nameValueParameters
is from char*
utf8 to string
So technically in this approach you always have string in C# which might not be desirable for many cases as you lose original data. But also totally makes sense for many other cases.
I am wondering why are you caching strings? Could this be used instead https://docs.microsoft.com/en-us/dotnet/api/system.string.intern?view=net-5.0 ?
Consider the case where a C library uses C strings (char*
) for identifiers that are most likely less than 25 characters in length. E.g. "PlayerEntity", "WeaponComponent", "RenderingSystem", etc. These strings are used frequently and usually out of context to the global C functions as part of the API to do something significant such as getting an entity or updating a component, etc, and then returning a callback. I do not want the garbage collector in C# to allocate memory in heap 0 for these strings from the callback. Instead I rather run a quick hash over the C string and use a hashtable to know if that C string's contents has already has an analogous memory allocated in the GC and use that reference instead.
System.String.Intern(string str)
would work if it accepted a raw pointer to a C style string. Otherwise, it accepts a string
as input but that string
instance still has to be allocated first which is not great. Also, according to the docs the memory would be retained until the application exits or restarts which is not great either.
I use following for char which is potentially utf8 to string (although C# comes with ascii char to string marshaller)
This is interesting, I'll have to try it out sometime and see if it makes more sense. Right now implicit casting makes CString8U
and CString16U
"behave" and be "accepted" into functions and methods which use string
. I do the same technique for CBool
and bool
.
Also FYI, I pasted the code to show custom marshallers could potentially be used (although I don't think they can be used in unmanaged function pointers)
New C# already has [MarshalAs(UnmanagedType.LPUTF8Str)]
which can convert char*
as utf8 to string
What do you think about a special mode of c2cs program that would generate managed C# code and also uses these marshaling?
a special mode
Sure, it could be an option that starts at the cs
program level:
I can give it a try and see if I can send a PR for you. Thanks
Just a note of warning, I have StyleCop ruleset turned on for this project. You can disable it by adding the following to the .csproj
in question:
<EnableAnalyzers>false</EnableAnalyzers>
Closing as the original problem was resolved
@roozbehid-ic I have just encountered a grave mistake on my part. wchar_t
is 2 bytes on Windows but actually 4 bytes on Linux. I'll need to re-think this.
I'm going to close this actually a create a new issue and link the related issues.
I am getting an error with wchar_t. I also included #include to bypass clang error but I do get following