Open asterite opened 5 years ago
The disadvantage is that this doesn't work when cross-compiling.
once Crystal is available in a target platform there's no need to cross-compile, unless the target platform has limited resources, which might be a point against this proposal
Very good point. The target platforms with most limited resources will need cross-compiling the most. Given the IOT quick growth having a binding generator that works for that case too can be more and more important in the future for Crystal to thrive.
If parsing C is the hard part, can't it be partially solved by using libclang, with https://github.com/crystal-lang/clang.cr?
Parsing is not the problem. The problem is automatically mapping things to Crystal, which isn't clear.
And isn't that what crystal_lib is doing?
Another problem is that doing it with clang imposes a dependency to it and the code is pretty complex. The template way, like what hsc2hs does, has no depenendeices other than a C compiler which is available in basically every dev machine. And the code should be pretty simple too.
I understand the immediate benefit for expanding macros and getting the actual value, but I'm wondering how it works for types and function definitions?
I'd love to see a small POC in a shard.
@ysbaddaden You don't use it for types and function definitions
What hsc2hs
provides, however, is a way to map a C integer type to an equivalent Haskell type. They do it with a macro that I don't understand: https://github.com/haskell/hsc2hs/blob/9056de46495348ea8a8fff419c82a91afda0e7e7/template-hsc.h#L69-L78
It casts a float literal to type, then casts one to int and if equal then type is integer, otherwise float. Then some overflow check for unsigned vs signed.
Note that cross compile is still possible but more complex —it needs a cross compiler environment to be installed, which allows to build the executable directly.
@oprypin there are many many edge cases with C headers that make interpretation of the clang AST and mapping to Crystal complex.
I believe CCR as explained here is a better solution than crystal_lib: it's much simpler (and very smart). It feels perfect for mapping complex C libraries, like the libc, where defines and structs are scattered in dozens of files and places.
It's still bothersome to have to write a mapping —thought it gives control on the actual mapping— so I still like c2cr where we can just do c_include "pcre.h"
to automagically map everything from that header.
Wait, why "@oprypin"? 😂
I started replying to "And isn't that what crystal_lib is doing?" but drifted away!
My comment was a reply to this comment, not in general
@ysbaddaden c2cr looks useful!
I think the main advantages of doing it like hsc2hs, so like in #8336, are:
.cr
file, or remember the tool's rules, to know what names to use for structs, fields, constants, etc. The way ccr works, you choose the names and you just ask for a little help from the tool: constant values, struct sizes, field offsets, etc. Then you don't need to look at the generated file because all the definitions are there in the template. Also, compilation errors, or even runtime errors, will point to source code that you can read and is not part of the generated file, they'll point to the template fileI agree with all those points. CCR is also much simpler to implement and doesnt require additional, large, dependencies (libclang).
The drawback is that you must write the mapping, and everyone may have different rules. An automated tool will always use a set of rules (no surprises), but with manual mapping you have control over its rules and can bend them as needed, to look nicer or potential clashes (never happened to me) or guess why a type wasn't mapped because if a c2cr limitation (happens a lot with llvm-c).
One flaw of this approach is complications with cross-compiling. I don't agree that cross-compiling is primarily a tool for compiler development. There are good reasons for cross-compiling applications. I believe this is even especially relevant for Crystal because the compiler has considerable demands for processing power and memory. On less equipped machines (for example embedded devices or single-board computers), compiling even a mid-sized Crystal program can take unreasonably long. Insufficient memory capacity can make a build completely impossible.
I think we must make sure there's a viable solution for cross-compiling.
This might not be too difficult to achieve, though. We'd just need a way to tell the compiler to use pre-existing binding definitions, instead of evaluating .ccr
on the fly. Maybe this could be as simple as recognizing existing .generated.cr
files as an override. Perhaps also with the target triple as part of the file name. These pre-generated bindings could even be checked into VCS, and regenerated only when necessary.
This concept could also work as basis for a cache which could be useful for native compiling. Even if generating bindings might only take a portion of the overall compile time, it still adds up. And it's usually very much unnecessary to regenerate bindings when the compiler already did that in the last build, three minutes ago. Could just use the result from the previous run and be done with it. That's a pretty easy opportunity for improving compiler performance (or rather: avoiding degradation).
Of course, this creates problems with cache invalidation. It would probably be quite hard to recognize when cached bindings would need to be regenerated due to changes in the linked libraries. Maybe not though. The PoC doesn't cover this, but when bindings link against any libraries other than libc
, linker arguments for these libraries would need to be known to CCR. This could be enough information to inform about cache status.
It's all just purely theoretical ideas though. I think the next step would probably be enhancing the PoC to support linking specific libraries.
I recently stumbled upon a tool called hsc2hs: it's a template language for Haskell to easy writing C bindings.
How it would work
The idea applied to Crystal would be something like this:
You write a .crc file with some directives
Here we just have two of the most simple directives:
include
will include a C header fileconst
will output the value of a C#define
or constantThe crc generates a C file that, when executed, produces a Crystal file
The C file generated by the program above will look like this:
We run the C file and pipe the output to the final Crystal file
The generated file will look like this:
We got the
1
directly from the pcre.h header file! :-)Thoughts
I think the idea of hsc2hs is brilliant: instead of trying to automatically generate bindings from a C header, like we tried to do in crystal_lib, which is very complex, we just get what we really need from C headers.
The things we could get from C headers are:
sizeof
structsalignment
of structs (maybe we don't need this)offsetof
struct fields: with just this, we can write and read any value from a struct by casting the struct pointer to aPointer(UInt8)
, adding the offset, then casting the pointer to the type of the struct field and fetching the value from it. Of course it works too if we have a plain struct (not a pointer of a struct) and we usepointerof
.The above means we can bind to any C struct:
UInt8[N]
whereN
is thesizeof
the struct.offsetof
and use pointers and castingNot as nice as writing a full C struct binding with all the types, but in many cases we are only interested in a couple of fields from the struct.
The benefit of doing this is portability: the template will be compiled to C and then compiled and ran to generate a Crystal file on the host machine.
The disadvantage is that this doesn't work when cross-compiling. However, I believe cross-compiling is mainly useful for the Crystal compiler itself, to be able to port it to other platforms (once Crystal is available in a target platform there's no need to cross-compile, unless the target platform has limited resources, which might be a point against this proposal).
How to implement this
This can be an external tool that you have to run to generate the Crystal file, and it could be automated by a hook in
shards
. An alternative is to have this functionality embedded in the compiler.. It doesn't sound that hard to implement, it's in a way similar to ECR.I'm interested in your feedback! What do you think?