Clozure / ccl

Clozure Common Lisp
http://ccl.clozure.com
Apache License 2.0
851 stars 103 forks source link

Update or replace ffigen4 #13

Open xrme opened 7 years ago

xrme commented 7 years ago

The interface databases that CCL uses are generated by a program called ffigen4. It is a set of patches to gcc-4.0.0 (see http://svn.clozure.com/publicsvn/ffigen4/)

These patches should be brought up-to-date. Alternatively, it might be an option to replace ffigen4 with some other tool. https://github.com/rpav/c2ffi might be suitable.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/42183960-update-or-replace-ffigen4?utm_campaign=plugin&utm_content=tracker%2F27935804&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F27935804&utm_medium=issues&utm_source=github).
eschaton commented 7 years ago

What about just using CFFI? Or would that be insufficient?

xrme commented 7 years ago

The interface databases make the #_ and #$ reader macros work. These reader macros are used extensively in the implementation of CCL itself.

I'm not a CFFI user, so I'm not really qualified to say whether it is nicer than CCL's native FFI, but I can say that I think that CCL's native FFI is a great feature.

ailisp commented 7 years ago

I know that CFFI is only a portable layer. Like what bordeaux-threads do with CCL's multiprocessing. So it may be not appropriate to use CFFI here,since we don't need to use the interface database in other CL and also Clozure's FFI provide more functionality. I suggest to update ffigen, because writing a new backend for c2ffi may only interested for developing CCL itself, library authors usually use CFFI other than platform specific ones.

ghost commented 7 years ago

I agree that CCL's native FFI is a great feature. But unfortunately, rarely projects build upon it, instead they actually build upon CFFI. so I made a little project ccl-cffi to use same function interface as CFFI, but implement it upon CCL more efficiently. In this way, my programs runs more efficiently, and I could still use other packages depend on CFFI.

ailisp commented 6 years ago

I would like to work on this. Today I took a look at ffigen, and CCL's ffi doc. Looks not good to always patch gcc to build ffigen. Using https://github.com/rpav/c2ffi is a good idea. Here I can either

  1. add a driver for c2ffi to generate ffi format or
  2. replace lib/parse-ffi.lisp with a new lisp program that input c2ffi's sexp output and output cdb. @xrme Which way you prefer? Approach 2 is using lisp, of course more fun than c++ :) Thanks!
xrme commented 6 years ago

You are welcome to work on this if you want to, but I worry that it is a rather big project.

I agree that we should try out https://github.com/rpav/c2ffi. If we can parse a simple header file with c2ffi and convince ourselves that the output matches (well, is isomorphic to) the current ffigen output, then that will give us some confidence that we can make it work.

I don't think we can completely replace parse-ffi.lisp. But I see no problem with writing (in Lisp or whatever) some program that will reformat c2ffi's output (either json or sexp) into the s-expression style ffigen format that parse-ffi.lisp knows how to process. If we find that c2ffi is working for us, we can consider writing a c2ffi driver in C++ at a later time.

My only reservation about c2ffi is that it uses an unstable (if not private) API to clang. There is a library called libclang. It provides a stable, C-based API. When I last looked at it, I didn't see how libclang dealt with C preprocessor content.

It would be great if we could use libclang for the interface translator, but maybe this is either not possible, or too much work.

If you are feeling up to the task of investigating this, then great! Thank you and good luck to you. I'll help you any way I can. If you spend some time on it and decide that it is too much trouble, I will certainly understand that, too.

ailisp commented 6 years ago

@xrme Thanks for the detailed observation. I need to study whether c2ffi generates isomorphic to ffigen after get ffigen4 works and try to compare their output. It's a bit difficult to get a working gcc 4.0 in current environment, but it's easier to do that in an old vm. But I will first try to patch current gcc (7.2) and if this is done, at least we have a modern ffigen4 and could compare its output with c2ffi.

c2ffi looks "relative" stable as it just updates for new llvm version and didn't change the example output json for 4 years: https://github.com/rpav/c2ffi/blame/llvm-5.0.0/README.md But I'll contact Ryan Pavlik to see if it's API is stable (after make sure the output is isomorphic).

As for libclang, I did some search, and there's a new flag to use C preprocessor: https://stackoverflow.com/questions/13881506/retrieve-information-about-pre-processor-directives I agree with you it would be much amount of work to use libclang. libclang is interesting and I would like to learn it but it takes some time.

xrme commented 6 years ago

If you haven't already, it may be helpful to consult https://trac.clozure.com/ccl/wiki/BuildFFIGEN and also https://trac.clozure.com/ccl/wiki/CustomFramework

In particular, there's an Mac-specific ffigen branch. I don't know if it builds on an up-to-date system. I have an ffigen binary that works.

Also see http://svn.clozure.com/publicsvn/ffigen4/ (in particular the branches/ directory)

ailisp commented 6 years ago

Thanks for these guides. Today I tried to build it on archlinux, but the gcc-4.0's makefile doesn't work for gcc-7.2. So I tried to build it in a Fedora 4 vm, which has exactly a gcc-4.0.0. The build is almost automatic, except I need to give objc-act.c's position to patch it. And I did a compare with c2ffi's generation: input:

#define FOO (1 << 2)

const int BAR = FOO + 10;

typedef struct my_point {
    int x;
    int y;
    int odd_value[BAR + 1];
} my_point_t;

enum some_values {
    a_value,
    another_value,
    yet_another_value
};

void do_something(my_point_t *p, int x, int y);

c2ffi's output

[
{ "tag": "const", "name": "BAR", "location": "/home/rpav/test.h:3:11", "type": { "tag": ":int" }, "value": 14 },
{ "tag": "struct", "name": "my_point", "id": 0, "location": "/home/rpav/test.h:5:16", "bit-size": 544, "bit-alignment": 32, "fields": [{ "tag": "field", "name": "x", "bit-offset": 0, "bit-size": 32, "bit-alignment": 32, "type": { "tag": ":int" } }, { "tag": "field", "name": "y", "bit-offset": 32, "bit-size": 32, "bit-alignment": 32, "type": { "tag": ":int" } }, { "tag": "field", "name": "odd_value", "bit-offset": 64, "bit-size": 480, "bit-alignment": 32, "type": { "tag": ":array", "type": { "tag": ":int" }, "size": 15 } }] },
{ "tag": "typedef", "name": "my_point_t", "location": "/home/rpav/test.h:9:3", "type": { "tag": ":struct", "name": "my_point", "id": 0 } },
{ "tag": "enum", "name": "some_values", "id": 0, "location": "/home/rpav/test.h:11:6", "fields": [{ "tag": "field", "name": "a_value", "value": 0 }, { "tag": "field", "name": "another_value", "value": 1 }, { "tag": "field", "name": "yet_another_value", "value": 2 }] },
{ "tag": "function", "name": "do_something", "location": "/home/rpav/test.h:17:6", "variadic": false, "parameters": [{ "tag": "parameter", "name": "p", "type": { "tag": ":pointer", "type": { "tag": "my_point_t" } } }, { "tag": "parameter", "name": "x", "type": { "tag": ":int" } }, { "tag": "parameter", "name": "y", "type": { "tag": ":int" } }], "return-type": { "tag": ":void" } }
]

ffigen's output. Modify a little to the struct definition for ANSI C, otherwise ffigen will complain "struct size is variant". Also a lot of (macro... ) lines are omitted here.

(macro ("test.h" 1) "FOO" "(1 << 2)")
(var ("test.h" 3)
 "BAR"
 (int ()) (static))
(struct ("" 0)
 "my_point"
 (("x" (field (int ()) 0 4))
  ("y" (field (int ()) 4 4))
  ("odd_value" (field (array 5 (int ())) 8 20))))
(type ("test.h" 9)
 "my_point_t"
 (struct-ref "my_point"))
(enum ("" 0)
 "some_values"(("a_value" 0)("another_value" 1)("yet_another_value" 2)))
(enum-ident ("" 0)
 "a_value" 0)
(enum-ident ("" 0)
 "another_value" 1)
(enum-ident ("" 0)
 "yet_another_value" 2)
(function ("test.h" 17)
 "do_something"
 (function
  ((pointer (typedef "my_point_t")) (int ()) (int ()) )
  (void ())) (extern))

For toplevel variable, struct, typedef, enum and function definition c2ffi contains enough information to build a ffi definition. The thing ffigen has but c2ffi doesn't is macro definitions, though c2ffi has a option -M to dump macro definitions to a separate file:

const long __c2ffi_FOO = FOO;

It doesn't really parse the macro definition, but this is a clever work around and let clang compile this snippet, then he knows the value of FOO and is able to convert it into a defconst . But to generate a ffigen style (macro ("test.h" 1) "FOO" "(1 << 2)") I need to patch c2ffi :-( I wonder how c2ffi will deal with macros like #define max(a,b) ((a)>(b)?(a):(b)), so I also try it. And c2ffi simply output nothing for it. ffigen will leave a raw (macro ...) line as expected. And I found c2ffi also need to update for each new version of clang. So based on your suggestions my final plan is:

  1. Maintain ffigen's patches for current gcc and maybe future gcc;
  2. Study libclang and c2ffi's src, build a slightly variant version that include raw macro lines and output in ffi format, and it's better to also utilize libclang's new feature on C preprocessor content.
xrme commented 6 years ago

Thanks for that research. Your planned approach seems good.

ailisp commented 6 years ago

Hi @xrme. I made a little progress today. Also who is gb in the svn log? I would rebase and use his name in git. Thanks! Made some minor change on Makefile.in. Now it can build with recent gcc, but still need to download gcc-4.0.0 source (in gcc-4.0.0 branch). I tested building with gcc version 7.2.1 20171128 (GCC): https://github.com/ailisp/ffigen Also try to patch gcc-7.2.0 in gcc-7.2.0 branch. However, build unpatched gcc 7.2 took me ~2 hours so the progress is slow. If still no progress I'll study libclang and c2ffi and working on a new ffigen.

xrme commented 6 years ago

@ailisp: gb is Gary Byers gb@clozure.com. He doesn't have a GitHub id.

Don't feel pressured to get this done because I mentioned this issue from that FreeBSD 12 bug. I can always build an ffigen on an older system and copy it to a FreeBSD 12 system if I need to.

ailisp commented 6 years ago

@xrme Thanks. Recent progress: after read ffigen.c, I found its structure is a bit difficult to fit libclang. libclang is given you the AST and you walk on it but current ffigen.c is to patch and execute in the parsing step in gcc. To build a new version in libclang will be simpler than working on current ffigen.c. Sorry for this, though previous work from Gary, Helmut and others are quite helpful and I'll attribute most of contributions to them. Current libclang support on preprocessing information is still incomplete. As we know for empty .h file there's hundred of lines of #define __GNUC__ 4, #define __linux__ 1, etc. libclang can only get __GNUC__ but not 4, and filename for these macros are NULL. For macros in specific files, /usr/include/stdio.h or a foo.h it's not a problem. libclang can get start/end locations of this macro definition and I can manually read it from file. So I can get:

(macro ("test.h" 1) "FOO" "(1 << 2)") 

but not:

(macro ("" 1) "__GNUC__" ???)

??? is not accessible (because don't know where's file). After long attempt I feel ashamed that I can get these from clang -dM -E -x c /dev/null > predefined.h :-) Also, I'm delight to find that I can only produce raw visible macro lines and parse-ffi.lisp will take care of recursive replace, macro with arguments, parse and eval c expressions. It's really a great work.

ailisp commented 6 years ago

Progress report: finished macro, enum, reference a primitive types, part of reference a pointer type and define a variable of primitive type: https://github.com/ailisp/ffigen5 When I'm testing with various type of pointer type, found a very bad news about function pointer: If parse void (*f)(void);, original ffigen will produce

(var ("test.h" 32)
 "f"
 (pointer (function
  ()
  (void ()))) (static))

But for libclang, it can first recognize f is a pointer, then clang_getPointeeType of this type returns a CXType_Unexposed, which means this information (function prototype that f points to) is not export to libclang. Can only be accessed by clang's C++ library libTooling (which is also used by c2ffi). But in it's introduction: https://clang.llvm.org/docs/Tooling.html

Do not use LibTooling when you…: want a stable interface so you don’t need to change your code when the AST API changes

What else I can get from libclang is a raw string of f's type: void (*)(void) I'm thinking about 3 ways for this (all have some disadvantages):

  1. Isolate and wrap required C++ part in a separate small lib parallelled to libclang, need to update as clang update. Additional maintainance required for the future but more general, and it's possible there's other features needed only in LibTooling.
  2. Though libclang doesn't allow access to c++ pointer, it have access to function definition. Add a temporary line replace (*) to a internal name ___g1234_ so I'm able to produce something like: (function () (void ())). Also works if there's parameters. But if there's function pointer parameters, well, a little messed up.
  3. Ignore and simply treat it as a void * pointer. As I read in parse-ffi.lisp, doing this looks safe (maybe I lose something?) But I want to produce at least as complete as original ffigen and don't like this way. Any good idea about this? Thanks!
xrme commented 6 years ago

Thank you, @ailisp, for investigating this.

I really want to use the stable libclang interface if we possibly can.

Let's try your approach number 3. The C ABIs don't distinguish between a function pointer and any other generic pointer. Writing something like:

(#_qsort :address base :size_t nel :size_t width :address comp)

where comp is defined via defcallback seems fine to me. There's no way anyone is going to write out the type of the comp function in CCL's FFI notation (even if the notation supports function pointers, which I'm not sure it even does).

ailisp commented 6 years ago

Thank you! Sounds good since it doesn't affect how we use such callback in lisp. I also prefer a stable interface.

eschaton commented 6 years ago

Please also file a bug against clang if you can, I think they’d want to know that this information isn’t exposed.

ailisp commented 6 years ago

@eschaton Hi, thanks and sorry for the late response. I was busy with a interview in San Francisco and just back home. I post a message in cfe-dev mail list: http://lists.llvm.org/pipermail/cfe-dev/2018-January/056566.html. Didn't hear replies though. @xrme I'm mostly done with reference a type. Having a problem for transparent union. Is transparent union means something like:

struct {
    int a;
    union {
        int b;
        float c;
    }
}

or gcc extension: __attribute__((__transparent_union__))?

ailisp commented 6 years ago

Today I finished almost all c part. Now lefting objc class and category. I have a question about function definition: in about line 460 of ffigen.c:

      /* struct ffi_typeinfo *arg_type_info; */
        /*
          It seems like functions that take a fixed number of arguments
          have a "void" argument after the fixed arguments, while those
          that take an indefinite number don't.
          That's perfectly sane, but the opposite of what LCC does.
          So, if there's a "void" argument at the end of the arglist,
          don't emit it; if there wasn't, emit one.
          Sheesh.
        */

But what I tested seems the opposite: given:

int af(int a, ...);
int bf(int a);

ffigen gives:

(function ("test.h" 62)
 "af"
 (function
  ((int ()) (void ()))
  (int ())) (extern))
(function ("test.h" 63)
 "bf"
 (function
  ((int ()) )
  (int ())) (extern))

Is this comment obsolete? I use the same behavior as ffigen gives.

xrme commented 6 years ago

@ailisp I've had a chance to experiment with your code, and it looks very promising. It is so helpful that you figured out so much of the libclang API. Thank you very much.

I need to generate a new set of interface databases from FreeBSD 12 header files. I spent part of today hacking on and using (my private fork of) your libclang-based ffigen, and I think it's going to work. I'm planning to spend the next two days on this and see how far I get. Starting Thursday, I'll be away for two weeks and probably won't have a chance to do very much hacking on CCL, but I am hoping that two days will be enough time to get it done.

FreeBSD will be a good start because we won't have to worry about dealing with Objective-C.

ailisp commented 6 years ago

Hi @xrme, I'm so glad you found it useful. Not sure if you forked my most recent version, now it supports system include path and works with h-to-ffi.sh. I manually compare it's output for elf.h with ffigen4. And mostly looks good, except some macro definition becomes random latin1 code. I guess it's caused by encoding. Another issue is it's not aware of attribute((transparent_union)). (I couldn't find something in libclang to detect that) Neither does ffigen4. Maybe ffigen4/gcc4 use a different syntax for that? Thanks for continue working on it. I plan to add objc part after all c part works. So I'll try to do that after your upcoming days' work. Also sorry for the delay. These days I was busy with preparing and taking interviews, and just got my first job after graduation.

On Feb 20, 2018 12:57 AM, "R. Matthew Emerson" notifications@github.com wrote:

@ailisp https://github.com/ailisp I've had a chance to experiment with your code, and it looks very promising. It is so helpful that you figured out so much of the libclang API. Thank you very much.

I need to generate a new set of interface databases from FreeBSD 12 header files. I spent part of today hacking on and using (my private fork of) your libclang-based ffigen, and I think it's going to work. I'm planning to spend the next two days on this and see how far I get. Starting Thursday, I'll be away for two weeks and probably won't have a chance to do very much hacking on CCL, but I am hoping that two days will be enough time to get it done.

FreeBSD will be a good start because we won't have to worry about dealing with Objective-C.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Clozure/ccl/issues/13#issuecomment-366874605, or mute the thread https://github.com/notifications/unsubscribe-auth/AMpSiIphMA1KbdWbiQjL1tMQOIOVEhSAks5tWl7UgaJpZM4MFNa- .

GOFAI commented 5 years ago

Any further progress on this?

ailisp commented 5 years ago

Hi @GOFAI @xrme have further update in https://github.com/xrme/ffigen5, not sure is it fully working?

xrme commented 5 years ago

I have gotten far enough with a new ffigen to be able to generate working headers for FreeBSD. I have been meaning to track down that code and check it in, but I haven't done that yet. I will try to do that soon.

GOFAI commented 5 years ago

I'm particularly interested in generating interface files for the newer macOS frameworks like SceneKit. How complete is the ObjC functionality?

ailisp commented 5 years ago

Unfortunately i don't know much of obj-c so the obj-c part is not even started. Probably you'll want to look at https://github.com/rpav/cl-autowrap and https://github.com/rpav/c2ffi and the ffigen4 if it works.

On Tue, Sep 11, 2018 at 10:42 PM, Edward Geist notifications@github.com wrote:

I'm particularly interested in generating header files for the newer macOS frameworks like SceneKit. How complete is the ObjC functionality?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Clozure/ccl/issues/13#issuecomment-420493136, or mute the thread https://github.com/notifications/unsubscribe-auth/AMpSiMNrimpNwuMUT7OKglyJo9_9LF98ks5uaHSjgaJpZM4MFNa- .

GOFAI commented 5 years ago

Has anyone gotten ffigen4 to compile on macOS using a recent XCode? The ObjC blocks version (ffigen-apple-gcc-5646/ffigen4) exits compilation on the following errors:

../../gcc-5646/gcc/toplev.c:564:1: error: redefinition of a 'extern inline'
      function 'floor_log2' is not supported in C99 mode
floor_log2 (unsigned HOST_WIDE_INT x)
^
../../gcc-5646/gcc/toplev.h:174:1: note: previous definition is here
floor_log2 (unsigned HOST_WIDE_INT x)
^
../../gcc-5646/gcc/toplev.c:599:1: error: redefinition of a 'extern inline'
      function 'exact_log2' is not supported in C99 mode
exact_log2 (unsigned HOST_WIDE_INT x)
^
../../gcc-5646/gcc/toplev.h:180:1: note: previous definition is here
exact_log2 (unsigned HOST_WIDE_INT x)
^

I'd try compiling it using the Homebrew formula that provides Apple's gcc 4.2.1-5666.3, but it only works on OS X 10.9 or older.

ailisp commented 5 years ago

Seems there’s a lot of error about “not supported in C99 mode”, what about trying clang -std=c89 or -std=gnu89 flag? https://clang.llvm.org/docs/UsersManual.html#differences-between-various-standard-modes

On Sep 14, 2018, at 3:25 AM, Edward Geist notifications@github.com wrote:

Has anyone gotten ffigen4 to compile on macOS using a recent XCode? The ObjC blocks version (ffigen-apple-gcc-5646/ffigen4) exits compilation on the following errors:

../../gcc-5646/gcc/toplev.c:564:1: error: redefinition of a 'extern inline' function 'floor_log2' is not supported in C99 mode floor_log2 (unsigned HOST_WIDE_INT x) ^ ../../gcc-5646/gcc/toplev.h:174:1: note: previous definition is here floor_log2 (unsigned HOST_WIDE_INT x) ^ ../../gcc-5646/gcc/toplev.c:599:1: error: redefinition of a 'extern inline' function 'exact_log2' is not supported in C99 mode exact_log2 (unsigned HOST_WIDE_INT x) ^ ../../gcc-5646/gcc/toplev.h:180:1: note: previous definition is here exact_log2 (unsigned HOST_WIDE_INT x) ^ I'd try compiling it using the Homebrew formula that provides Apple's gcc 4.2.1-5666.3, but it only works on OS X 10.9 or older.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Clozure/ccl/issues/13#issuecomment-421256684, or mute the thread https://github.com/notifications/unsubscribe-auth/AMpSiOSL39Yv4xPCkDtxs70i-4QF51yFks5ua1nSgaJpZM4MFNa-.

GOFAI commented 5 years ago

I've managed to compile ffigen4 under macOS 10.13 using the gcc provided by the gcc@4.9 brew formula. I'm not sure, however, whether the h-to-ffi.sh is broken or if I need to change something else in the populate.sh file to get it to work. Once I point it at the current SDK, it seems to always choke on the following (many previous lines omitted):

Need to create info for type:
 <real_type 0x1034ec370 NSTimeInterval sizes-gimplified DF
    size <integer_cst 0x141801d80 type <integer_type 0x1418130b0 bit_size_type> constant invariant 64>
    unit size <integer_cst 0x141801db0 type <integer_type 0x141813000 long unsigned int> constant invariant 8>
    align 64 symtab 8117 alias set -1 precision 64>
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/Foundation.framework/Headers/NSDate.h:26: confused by earlier errors, bailing out

Any ideas about how to interpret this? I seem to get stuck on this same error both for newer frameworks like SceneKit as well as older ones like OpenGL that ffigen4 should obviously be able to handle.

xrme commented 5 years ago

Newer SDKs include constructs in the header files that the old ffigen/gcc doesn't understand.

GOFAI commented 5 years ago

I gather that in this instance it's being confused by NS_DESIGNATED_INITIALIZER, which is declared in NSObjCRuntime.h. I presume addressing this isn't going to be as simple as adding an #include <Foundation/NSObjCRuntime.h> to the ffigen source somewhere?

xrme commented 5 years ago

You could try adding -include ${SDK}/path/to/NSObjRuntime.h to the invocation of the h2-ffi.sh script in the relevant populate.sh file.

GOFAI commented 5 years ago

Alas, CFLAGS="-m64 -isysroot ${SDK} -fblocks -mmacosx-version-min=10.6 -include ${SDK}/System/Library/Frameworks/Foundation.framework/Headers/NSObjRuntime.h" gives the exact same result.

GOFAI commented 5 years ago

I've been tinkering with cffi bindings to libclang to try and make sense of how it returns information related to ObjC classes. I've hit a snag, though, as even though the parser seems to be running I can't get a proper reference back to the cursor. Here's my code:

(cffi:defcallback visit-func :int ((cursor (:pointer (:struct CXCursor))) 
                                   (parent (:pointer (:struct CXCursor))) 
                                   (client-data :pointer))
  (declare (ignorable client-data parent))
  (print (clang_getCursorSpelling cursor))
  #.(cffi:foreign-enum-value 'CXChildVisitResult :CXChildVisit_Recurse))

(cffi:with-foreign-string (file "/Users/Walrus/hello.c")
  (let* ((index (clang_createIndex 0 1))
         (unit (clang_parseTranslationUnit index 
                                           file 
                                           (cffi:null-pointer) 
                                           0 
                                           (cffi:null-pointer) 
                                           0 
                                           (logior 
                                            (cffi:foreign-enum-value 'CXTranslationUnit_Flags :CXTranslationUnit_DetailedPreprocessingRecord) 
                                            (cffi:foreign-enum-value 'CXTranslationUnit_Flags :CXTranslationUnit_SkipFunctionBodies))))
         (root (clang_getTranslationUnitCursor unit)))
    (clang_visitChildren root (cffi:callback visit-func) (cffi:null-pointer))
    (clang_disposeTranslationUnit unit)
    (clang_disposeIndex index)))

In CCL this crashes with the following error:

? Unhandled exception 11 at 0x251721b9, context->regs at #x700005c62940
Exception occurred while executing foreign code
 at clang_getTranslationUnitCursor + 25
received signal 11; faulting address: 0x112755000

In SBCL, it gives:

CORRUPTION WARNING in SBCL pid 59756(tid 0xb000e000):
Memory fault at 0x0 (pc=0x96c41b9, sp=0x724cca0)
The integrity of this image is possibly compromised.
Continuing with fingers crossed.

Curiously, in the REPL in the CCL native IDE, it's possible to execute each command separately without causing the crash, but it doesn't actually work: (clang_getTranslationUnitCursor unit) returns a pointer pointer-eq to unit, which seems like it can't be right. Running the block of code above results in the same error as in SLIME. Do those of you who have tinkered around with libclang have any sense of what might be going wrong here?

EDIT: Turns out that libclang passes a lot of structs by value, so the old-fashioned cffi bindings generated by swig didn't work properly because it treated them all as pointers. By loading cffi-libffi it's possible to pass and return structs by value from lisp, although at the moment I'm still sorting out how to translate the plists it returns the c structs as into a native format so they can be passed into libclang.

GOFAI commented 5 years ago

OK, I've gotten the Lisp interface to libclang working. As I lack a working version of the historical version of ffigen that is used for CCL and (so far as I can tell) there's no documentation for its Objective-C extensions, could you please provide me with an example of the .ffi files for a macOS framework so I can experiment with extracting the relevant info from the libclang AST?

xrme commented 5 years ago

I am not sure you want to use the FFI to talk to libclang. It's probably better for ffigen to remain a C program.

xrme commented 5 years ago

http://setf.clozure.com/~rme/Cocoa.ffi is the output of the historical ffigen when it processes Cocoa.h.

GOFAI commented 5 years ago

Thanks! The matter of what form ffigen takes is a decision for the CCL maintainers (I have another, unrelated use case for my libclang interface). I'm just trying to do experimental programming to understand what information libclang provides for ObjC, and I'd much rather do that in Lisp than C. I'll attempt to contribute ObjC support to ffigen5 if I sort it out.

xrme commented 5 years ago

Awesome. I've been wanting to know how libclang exposes Objective-C information myself, so I'll be interested to see what you figure out.

GOFAI commented 5 years ago

After some experimentation, it appears that ObjC support will have to wait at least until LLVM 8 is released. The problem is that libclang doesn't expose the fields of ObjC classes, so there isn't a way to ascertain their layout, types, and offsets like the examples in the .ffi file. It only exposes the interfaces, and trying to use clang_Type_visitFields on those doesn't do anything. Fortunately, contributions in the next LLVM release are going to add additional functions to libclang for extracting information from ObjC classes. I'm not absolutely sure that these will be sufficient, but hopefully they will be--I'll check when LLVM 8 comes out in a few months.

xrme commented 5 years ago

I don't think that the Objective-C interface in CCL, at least, needs to access the underlying struct layout of an Objective-C object. That stuff is generally (if not always) off-limits to the programmer.

GOFAI commented 5 years ago

The Cocoa.ffi file contains entries like this:

(struct ("" 0)
 "__entityMappingFlags"
 (("_isInUse" (bitfield (unsigned ()) 0 1))
  ("_reservedEntityMapping" (bitfield (unsigned ()) 1 31))))
(struct ("" 0)
 "NSEntityMapping"
 (("isa" (field (typedef "Class") 0 8))
  ("_reserved" (field (pointer (void ())) 8 8))
  ("_reserved1" (field (pointer (void ())) 16 8))
  ("_mappingsByName" (field (pointer (struct-ref "NSDictionary")) 24 8))
  ("_name" (field (pointer (struct-ref "NSString")) 32 8))
  ("_mappingType" (field (typedef "NSEntityMappingType") 40 8))
  ("_sourceEntityName" (field (pointer (struct-ref "NSString")) 48 8))
  ("_sourceEntityVersionHash" (field (pointer (struct-ref "NSData")) 56 8))
  ("_destinationEntityName" (field (pointer (struct-ref "NSString")) 64 8))
  ("_destinationEntityVersionHash" (field (pointer (struct-ref "NSData")) 72 8))
  ("_sourceExpression" (field (pointer (struct-ref "NSExpression")) 80 8))
  ("_userInfo" (field (pointer (struct-ref "NSDictionary")) 88 8))
  ("_entityMigrationPolicyClassName" (field (pointer (struct-ref "NSString")) 96 8))
  ("_attributeMappings" (field (pointer (struct-ref "NSMutableArray")) 104 8))
  ("_relationshipMappings" (field (pointer (struct-ref "NSMutableArray")) 112 8))
  ("_entityMappingFlags" (field (struct-ref "__entityMappingFlags") 120 4))))
(objc-class ("" 0)
 "NSEntityMapping"
 ("NSObject")
 ()
 ())

The first two contain info that can't be accessed from the current libclang. Do you think that they are currently ignored by parse-ffi.lisp?

EDIT: Looking at parse-ffi.lisp, my impression is that these struct forms for class fields are processed, but that they probably aren't used by the CCL ObjC bridge. The big sticking point is the base class, which is needed by the objc-class form: LLVM 8's libclang will provide a clang_Type_getObjCObjectBaseType() function that will hopefully take care of that.

eschaton commented 5 years ago

If by “the fields of ObjC classes” you mean their instance variables, it should be fine that libclang doesn’t expose them since nothing should manipulate them; they must not be treated as API.

That NSEntityMapping example in particular contains a whole bunch of stuff that’s marked @private for everything but 32-bit Intel and PowerPC, and which should be treated as such by developers on those platforms as well.

GOFAI commented 5 years ago

LLVM 8 is now released with additional ObjC functionality available in libclang, so it should now be possible to experiment as to whether everything needed to add ObjC support to ffigen5 is now exposed. Am I correct in my understanding that all that need to be generated for ObjC are the objc-class, objc-class-method, and objc-instance-method forms, without the struct forms mentioned above? Or do those struct forms that old ffigen provided with the instance variables for ObjC classes need to be present, even if empty?

GOFAI commented 5 years ago

I've been tinkering around with using LLVM 8's libclang to explore the ObjC headers in Cocoa.h. The good news is that it seems pretty straightforward to access most of the stuff that parse-ffi.lisp uses, including the superclass. One exception is the offsets for instance variables, but it's not clear to me that those actually do anything under ObjC 2.0 anyway. Whenever the offset is needed, isn't it retrieved from the ObjC runtime? That's my impression reading through objc-clos.lisp, but I'm not 100% clear on the way the interface database generated by parse-ffi.lisp is used by the ObjC bridge.

eschaton commented 5 years ago

I don’t know how CCL does it, but you should always get the offset fo an Objective-C ivar from the runtime. While they need to be known at compile time for certain architectures using the old runtime (thus causing a fragile base class problem), the modern ObjC 2 runtime as used by almost all platforms can dynamically determine where to place ivars for best performance, so you should ask the runtime any time that information is needed.

Also nothing should actually touch an Objective-C object’s ivars other than that object’s class, so there should absolutely never be a need to parse ivar offsets from system header files.

If CCL is going to generate ObjC classes at runtime, then it should be getting any ivar offsets it needs to use as part of that process from the runtime at the point it’s creating the class.

GOFAI commented 5 years ago

The relevant code in objc-clos.lisp (which is repeated a few times) seems to be:

(with-macptrs ((ivar (#_class_getInstanceVariable class name)))
            (unless (%null-ptr-p ivar)
              (let* ((offset (#_ivar_getOffset ivar)))

But there's also alternative legacy code for when ObjC 2.0 isn't available. Are there still any CCL users who need compatibility with legacy ObjC runtimes?

If we never need to parse ivar offsets from the header files, does it make sense to omit those from the ffigen output and tweak parse-ffi.lisp so that it no longer expects them? As it's currently written it expects an integer for the ivar offset and bitshifts it. My impression is that the value is currently ignored, so just including an arbitrary integer for the offset will avoid issues, but that seems like a crude solution.

eschaton commented 5 years ago

Even on Mac OS X on PowerPC and OPENSTEP/Mach on 68K NeXT hardware, you can get the ivar layout and names from the runtime, and have to tell the runtime about yours when creating classes dynamically.

Omitting ivars from the ffigen output sounds like a reasonable solution to me.

GOFAI commented 5 years ago

My personal inclination is to try to reproduce everything in the old ffigen output format that is easy to extract from the header files using libclang. The ivar names and type sizes look easy, so it'll be trivial to retain them. If the ivar offsets even exist in the source tree generated by libclang, I can't find them--which makes sense given that they apparently aren't supposed to be there!

xrme commented 5 years ago

AFAIK, we don't use the ivar offsets at all.

Don't worry about maintaining compatibility with the legacy Objective-C runtime.