crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.45k stars 1.62k forks source link

[RFC] C bindings helper #8307

Open asterite opened 5 years ago

asterite commented 5 years ago

I recently stumbled upon a tool called hsc2hs: it's a template language for Haskell to easy writing C bindings.

How it would work

The idea applied to Crystal would be something like this:

You write a .crc file with some directives

# pcre.crc

<%= include "pcre.h" %>

lib LibPCRE
  enum Options
    CASELESS = <%= const PCRE_CASELESS %>
  end
end

Here we just have two of the most simple directives:

The crc generates a C file that, when executed, produces a Crystal file

The C file generated by the program above will look like this:

// pcre.crc.c
// This line because of our "include" directive
#include "pcre.h"
#include <stdio.h>

int main(int argc, char** argv) {
  printf("lib LibPCR\n");
  printf("  enum Options\n");
  printf("    CASELESS =");
  printf(PCRE_CASELESS); // this line because of the "const" directive
  printf("\n");
  printf("  end\n");
  printf("end\n");
  return 0;
}

We run the C file and pipe the output to the final Crystal file

$ clang pcre.crc.c -o some_temp_name
$ ./some_temp_name > pcre.cr

The generated file will look like this:

# pcre.cr
lib LibPCRE
  enum Options
    CASELESS = 1
  end
end

We got the 1 directly from the pcre.h header file! :-)

Thoughts

I think the idea of hsc2hs is brilliant: instead of trying to automatically generate bindings from a C header, like we tried to do in crystal_lib, which is very complex, we just get what we really need from C headers.

The things we could get from C headers are:

The above means we can bind to any C struct:

Not as nice as writing a full C struct binding with all the types, but in many cases we are only interested in a couple of fields from the struct.

The benefit of doing this is portability: the template will be compiled to C and then compiled and ran to generate a Crystal file on the host machine.

The disadvantage is that this doesn't work when cross-compiling. However, I believe cross-compiling is mainly useful for the Crystal compiler itself, to be able to port it to other platforms (once Crystal is available in a target platform there's no need to cross-compile, unless the target platform has limited resources, which might be a point against this proposal).

How to implement this

This can be an external tool that you have to run to generate the Crystal file, and it could be automated by a hook in shards. An alternative is to have this functionality embedded in the compiler.. It doesn't sound that hard to implement, it's in a way similar to ECR.

I'm interested in your feedback! What do you think?

oprypin commented 5 years ago

Speaking of printf https://github.com/oprypin/crsfml/blob/ce0f5c7aa60b7879ecbfcb174624450b78e96e06/generate.cr#L1884

vlazar commented 5 years ago

The disadvantage is that this doesn't work when cross-compiling.

once Crystal is available in a target platform there's no need to cross-compile, unless the target platform has limited resources, which might be a point against this proposal

Very good point. The target platforms with most limited resources will need cross-compiling the most. Given the IOT quick growth having a binding generator that works for that case too can be more and more important in the future for Crystal to thrive.

j8r commented 5 years ago

If parsing C is the hard part, can't it be partially solved by using libclang, with https://github.com/crystal-lang/clang.cr?

oprypin commented 5 years ago

https://i.kym-cdn.com/entries/icons/original/000/028/596/dsmGaKWMeHXe9QuJtq_ys30PNfTGnMsRuHuo_MUzGCg.jpg

asterite commented 5 years ago

Parsing is not the problem. The problem is automatically mapping things to Crystal, which isn't clear.

oprypin commented 5 years ago

And isn't that what crystal_lib is doing?

asterite commented 5 years ago

Another problem is that doing it with clang imposes a dependency to it and the code is pretty complex. The template way, like what hsc2hs does, has no depenendeices other than a C compiler which is available in basically every dev machine. And the code should be pretty simple too.

ysbaddaden commented 5 years ago

I understand the immediate benefit for expanding macros and getting the actual value, but I'm wondering how it works for types and function definitions?

I'd love to see a small POC in a shard.

asterite commented 5 years ago

@ysbaddaden You don't use it for types and function definitions

What hsc2hs provides, however, is a way to map a C integer type to an equivalent Haskell type. They do it with a macro that I don't understand: https://github.com/haskell/hsc2hs/blob/9056de46495348ea8a8fff419c82a91afda0e7e7/template-hsc.h#L69-L78

ysbaddaden commented 5 years ago

It casts a float literal to type, then casts one to int and if equal then type is integer, otherwise float. Then some overflow check for unsigned vs signed.

Note that cross compile is still possible but more complex —it needs a cross compiler environment to be installed, which allows to build the executable directly.

ysbaddaden commented 5 years ago

@oprypin there are many many edge cases with C headers that make interpretation of the clang AST and mapping to Crystal complex.

I believe CCR as explained here is a better solution than crystal_lib: it's much simpler (and very smart). It feels perfect for mapping complex C libraries, like the libc, where defines and structs are scattered in dozens of files and places.

It's still bothersome to have to write a mapping —thought it gives control on the actual mapping— so I still like c2cr where we can just do c_include "pcre.h" to automagically map everything from that header.

oprypin commented 5 years ago

Wait, why "@oprypin"? 😂

ysbaddaden commented 5 years ago

I started replying to "And isn't that what crystal_lib is doing?" but drifted away!

oprypin commented 5 years ago

My comment was a reply to this comment, not in general

asterite commented 5 years ago

@ysbaddaden c2cr looks useful!

I think the main advantages of doing it like hsc2hs, so like in #8336, are:

ysbaddaden commented 5 years ago

I agree with all those points. CCR is also much simpler to implement and doesnt require additional, large, dependencies (libclang).

The drawback is that you must write the mapping, and everyone may have different rules. An automated tool will always use a set of rules (no surprises), but with manual mapping you have control over its rules and can bend them as needed, to look nicer or potential clashes (never happened to me) or guess why a type wasn't mapped because if a c2cr limitation (happens a lot with llvm-c).

straight-shoota commented 2 years ago

One flaw of this approach is complications with cross-compiling. I don't agree that cross-compiling is primarily a tool for compiler development. There are good reasons for cross-compiling applications. I believe this is even especially relevant for Crystal because the compiler has considerable demands for processing power and memory. On less equipped machines (for example embedded devices or single-board computers), compiling even a mid-sized Crystal program can take unreasonably long. Insufficient memory capacity can make a build completely impossible.

I think we must make sure there's a viable solution for cross-compiling.

This might not be too difficult to achieve, though. We'd just need a way to tell the compiler to use pre-existing binding definitions, instead of evaluating .ccr on the fly. Maybe this could be as simple as recognizing existing .generated.cr files as an override. Perhaps also with the target triple as part of the file name. These pre-generated bindings could even be checked into VCS, and regenerated only when necessary.

This concept could also work as basis for a cache which could be useful for native compiling. Even if generating bindings might only take a portion of the overall compile time, it still adds up. And it's usually very much unnecessary to regenerate bindings when the compiler already did that in the last build, three minutes ago. Could just use the result from the previous run and be done with it. That's a pretty easy opportunity for improving compiler performance (or rather: avoiding degradation).

Of course, this creates problems with cache invalidation. It would probably be quite hard to recognize when cached bindings would need to be regenerated due to changes in the linked libraries. Maybe not though. The PoC doesn't cover this, but when bindings link against any libraries other than libc, linker arguments for these libraries would need to be known to CCR. This could be enough information to inform about cache status.

It's all just purely theoretical ideas though. I think the next step would probably be enhancing the PoC to support linking specific libraries.