bytedeco / javacpp

The missing bridge between Java and native C++
Other
4.5k stars 584 forks source link

Improve Parser: Use the Clang API #51

Open Arcnor opened 8 years ago

Arcnor commented 8 years ago

I'm in the process of doing this right now. Currently, the following issues exists with the approach I'm taking:

saudet commented 2 years ago

I'm not sure I understand what you mean by "bootstrapping", but whatever it is, it's not going to be a bigger problem than supporting C++. Start with getting something working for C++, and if you get that working, the rest isn't going to be a problem.

HGuillemet commented 2 years ago

I mean the problem of "chicken or the egg": You need the LLVM presets to use the parser, and you need the parser to build the LLVM presets.

saudet commented 2 years ago

Didn't you just say that you'd use the one from jextract? Just do that, that's fine.

mcimadamore commented 2 years ago

We will have some bootstraping problem if we use a JavaCPP preset in the Parser used to build presets. Won't we ?

I have started to play with the C API of Clang bound by Panama with jextract and it seems to do the job. Preprocessor directives and comments are available. It even parses Doxygen-like syntaxes.

I suggest to rewrite the parser using this API, first to reproduce the current behavior of the parser, as a preliminary step to issue #402. Then we could try to change the parser and generator so that C++ classes are mapped to Java classes that use FMA instead of Pointer.

What do you think of this plan ?

@HGuillemet I believe you are suggesting to use an approach similar to that used by jextract to e.g. generate libclang bindings which rely on the foreign function API. That part works well, and, assuming a tool only need the C clang API, that could be good enough. We did some experiments parsing C++ with the C API and these were not successful, as the C API, at the moment, does not expose enough information re. template instantiation (the information is there under the hood, just not exposed in the C API, unfortunately). These same problems were observed in other projects using the C API as well (I seem to recall Rust's bindgen having several workarounds to make C++ sort of work with that API).

I do hope that, in the future, the clang C API will be improved to add those missing 2-3 functions which will make handling templates much more manageable. At this point in time I cannot recommend using the clang C API to emit bindings for real-world C++ code.

HGuillemet commented 2 years ago

Thank you for these informations. Yes, that's what I was suggesting. I'll investigate a bit more to see if recent versions of LLVM provide something good enough with the C API. Else I guess we will have to stick with Samuel's present magic parser.

junlarsen commented 2 years ago

If I'm understanding correctly, the "bootstrap problem" is the problem that we would depend on the libclang implementation to create the libclang implementation, similar to how you need GCC to build GCC.

We already solved that part, as we already have a stage 1 libclang implementation at https://github.com/bytedeco/javacpp-presets/blob/master/llvm/src/gen/java/org/bytedeco/llvm/global/clang.java made with the old parser which would suffice to build the new javacpp parser.

I actually had a go at this some time back, and I seemed to be able to parse some very basic C headers with the libclang API from JavaCPP Presets. If missing Clang C functions is an issue, we can either:

  1. upstream changes and pull them down (very slow process due to us building clang releases) - can be done alongside 2)
  2. add the functions ourselves like we already do in https://github.com/bytedeco/javacpp-presets/blob/master/llvm/src/main/resources/org/bytedeco/llvm/include/TargetStubs.h
mcimadamore commented 2 years ago

Thank you for these informations. Yes, that's what I was suggesting. I'll investigate a bit more to see if recent versions of LLVM provide something good enough with the C API. Else I guess we will have to stick with Samuel's present magic parser.

IIRC, one of the main missing bit of functionality was being able to retrieve all template instantiations for a given template method/class (as a binder would need to generate special code for all of these).

saudet commented 2 years ago

@HGuillemet Ah, you were referring to missing functionality from the C API of Clang. We can easily "extend" the C API ourselves, that's not an issue. I thought I mentioned that in this thread, but it's actually in https://github.com/bytedeco/javacpp-presets/issues/475#issuecomment-756045987. So just add anything along that you need, that's not a problem.

saudet commented 2 years ago

FYI, here's something that looks more useful than Panama since it supports C++ and it's actually able to inline native functions:

@HGuillemet You may want to start looking at that, in addition to Panama.

Thanks to @frankfliu for letting me know about that!

HGuillemet commented 2 years ago

This project is interesting. It aims at providing a full alternative to JavaCPP (and Panama). Like JavaCPP, Java code instrumented with specific annotations is used to generate JNI (and Java) glue code. Two features are worth to be pointed out, compared to JavaCPP:

However:

saudet commented 2 years ago

This project is interesting. It aims at providing a full alternative to JavaCPP (and Panama).

It doesn't aim to be an alternative to Panama, that one is never going to support C++ or function inlining, it's not part of their goals. Like I explained before, I don't think anyone is going to switch from JNI to Panama, and that project (fastFFI) demonstrates that well. JNI is just fine, it's already fast enough and can be made user-friendly with tools like JavaCPP. However, to increase performance to any meaningful degree, what we need is to bring something like LLVM on the JVM without anything "foreign", which Panama is not willing to do, so in my opinion it's never going to give us anything substantial over JNI.

As for being a "full alternative" to JavaCPP, it's possible, but JavaCPP doesn't use Clang or anything like that, so if that's what they have started to work on, I would consider that an evolution over JavaCPP, and we should probably try to collaborate with them instead of redoing the same thing ourselves. @frankfliu What do you think?

frankfliu commented 2 years ago

@saudet I agree with you. If their architecture is clean and foundation is solid, improving usability is relatively easier.

HGuillemet commented 2 years ago

Their component (LLVM4JNI) that uses clang to compile the JNI glue code to bytecode and then translates it to JVM bytecode seems more or less independent and could probably be applied as is to JavaCPP.

If they do plan to opensource a C++ parser based on clang, with support for generics, I agree that it would be interesting to know more about it before continuing to work on our own.

This project seems quite old in fact. I'd say at least 10 years. They decide to opensource it now, for some reasons it would be interesting to also know about, as well as their plans and available resources.

shanemikel commented 1 year ago

Aside from java-port/clank, the C#/Mono/Xamarin crowd also have a lot of experience binding and porting C++ class hierarchies.

Both of these projects use the Clang frontend to produce ASTs and port the Clang AST class hierarchy to C# for consumer side codegen APIs:

I think both projects produce their own Clang C bindings and manually port the C++ AST bits they need. They also both have non-trivial C++ code they use to control the Clang frontend.

Xamarin project has bindings for most Objective-C libraries on Mac and iOS here: xamarin/xamarin-macios. Would love to understand their process. It has to be one of the largest successful bindings projects ever. I'm sure it's largely automated and my guess is they use Clang's Objective-C frontend...

shanemikel commented 1 year ago

SkiaSharp is another example. A large C# binding project for Google's Skia 2D graphics library. They are a mono project used by Microsoft in .NET.

In the binding generator module they are using CppAst.NET, which implements a C++ AST in C#.

CppAst.NET does not use the C++ library, cppast.

They appear to have stolen the name, but cppast claims to expose bits of Clang's AST which are not exposed directly by libclang. If so, that may be useful.


Edit: I was mistaken that CppAst.NET binds cppast. It is merely named after the latter.

HGuillemet commented 1 year ago

What about using clangd ? It would remove the chicken or the egg problem mentioned above and allow to efficiently parse files as well as code chunks.

shanemikel commented 1 year ago

I've taken a cursory look at that. I somewhat like the idea.

Clangd depends on understanding the project's build system through compile_commands.json: https://clangd.llvm.org/installation#project-setup. This is fairly easy to produce for CMake projects and there are tools like https://github.com/rizsotto/Bear that can produce it for any build system by intercepting and parsing compiler command arguments. It's kind of a hack that is more acceptable for getting IDE features than it is to reliably produce a build artifact.

Many large bindings projects seem to effectively reproduce parts of the build system, dependency graph, and source file hierarchy of their underlying library anyway. Generating compile_commands.json by hand or ad-hoc (e.g. by script) isn't totally out of step. Taking the JavaCpp approach, some of this information could be generated from Java source annotations.

One possible issue: AST access is provided as an LSP protocol extension: https://clangd.llvm.org/extensions#ast. That page features a major caveat:

These extensions may evolve or disappear over time. If you use them, try to recover gracefully if the structures aren’t what’s expected.

There is an LSP implementation for Java here: https://github.com/eclipse/lsp4j. I think the protocol is similar to HTTP, so client implementation shouldn't be too bad.

Other than providing per-file AST access, clangd provides an index which may be marginally helpful: https://clangd.llvm.org/design/indexing

LifeIsStrange commented 1 year ago

Hi @saudet little update, Context: in a previous issue I made about leveraging the foreign linker/foreign memory api

You said

they haven't been able to get any performance gains over JNI, yet, so it's unclear how it's going to be useful at this point

To which Mcimadamore outlined some possible scenarios where the foreign linker api could lead to better performance than JNI.

The news: There is a new blog post on Java inside showing that the foreign memory api has seen a considerable performance improvement in JDK22 and for native strings, seems to be significantly better than JNI https://minborgsjavapot.blogspot.com/2023/08/java-22-panama-ffm-provides-massive.html?m=1 The future improvements section also caught my interest:

FFM allows us to use custom allocators and so, if we make several calls, we can reuse memory segments thereby improving performance further. This is not possible with JNI.

And mention future internal use of the vector api.

saudet commented 1 year ago

FMA is unrelated to Clang or JNI, please see issue #402