delta-incubator / delta-kernel-rs

A native Delta implementation for integration with any query engine
Apache License 2.0
144 stars 39 forks source link

Add workaround for MSVC specific C++ issue #451

Open samansmink opened 4 days ago

samansmink commented 4 days ago

Please describe why this is necessary.

To be able to compile the DuckDB Delta extension, a C++ based project linking against the delta kernel through the FFI, I currently need to manually patch the delta_kernel_ffi.hpp header with the following struct:

struct im_an_unused_struct_that_tricks_msvc_into_compilation {
    ExternResult<KernelBoolSlice> field;
    ExternResult<bool> field2;
    ExternResult<EngineBuilder*> field3;
    ExternResult<Handle<SharedExternEngine>> field4;
    ExternResult<Handle<SharedSnapshot>> field5;
    ExternResult<uintptr_t> field6;
    ExternResult<ArrowFFIData*> field7;
    ExternResult<Handle<SharedScanDataIterator>> field8;
    ExternResult<Handle<SharedScan>> field9;
    ExternResult<Handle<ExclusiveFileReadResultIterator>> field10;
    ExternResult<KernelRowIndexArray> field11;
};

I've taken this trickery from https://github.com/mozilla/cbindgen/issues/402#issuecomment-578680163

While this seems to work fine, this is a little impractical because it requires manually regenerating and patching the ffi header on every update. Furthermore, it makes it a little finnicky to set up a CI job in the kernel which tests kernel against the DuckDB Delta extension automatically.

First of all, this doesn't appear to be a compiler bug:

Linkage from C++ to objects defined in other languages and to objects defined in C++ from other languages is implementation-defined and language-dependent. Only where the object layout strategies of two language implementations are similar enough can such linkage be achieved (see https://stackoverflow.com/questions/8948443/interface-to-c-objects-through-extern-c-functions)

Note that while msvc throw an error, clang is also not super happy and throws a warning

There seem to be two solutions:

I've chosen the latter because it seemed to cause less potential side effects. Explicit instantiation would be allowed to include the header once in the entire program.

Tbh I think explicit instantiation might work but is probably not really user-friendly compared to the struct approach

Describe the functionality you are proposing.

I think the best approach is to let the kernel generate this workaround automatically. It seems pretty harmless compared to explicit instantiation.

Additional context

No response

scovich commented 3 days ago

I notice the list includes only ExternResult -- does this issue only affect function return types?

scovich commented 3 days ago

I proposed a fix upstream with https://github.com/mozilla/cbindgen/pull/1024. Hopefully it gets accepted!

scovich commented 3 days ago

With a local checkout of the above fix and the following local changes:

diff --git a/ffi/Cargo.toml b/ffi/Cargo.toml
index 1bd69e7..f1a6634 100644
--- a/ffi/Cargo.toml
+++ b/ffi/Cargo.toml
@@ -30,7 +30,7 @@ arrow-data = { version = "53.0", default-features = false, features = [
 arrow-array = { version = "53.0", default-features = false, optional = true }

 [build-dependencies]
-cbindgen = "0.27.0"
+cbindgen = { path = "../../../cbindgen" }
 libc = "0.2.158"

 [dev-dependencies]
diff --git a/ffi/cbindgen.toml b/ffi/cbindgen.toml
index 491333a..b5951b4 100644
--- a/ffi/cbindgen.toml
+++ b/ffi/cbindgen.toml
@@ -13,6 +13,9 @@ namespace = "ffi"
 "feature = default-engine" = "DEFINE_DEFAULT_ENGINE"
 "feature = sync-engine" = "DEFINE_SYNC_ENGINE"

+[export]
+instantiate_return_value_monomorphs = true
+
 [export.mangle]
 remove_underscores = true

... the generated delta_kernel_ffi.hpp now includes this:

/// Dummy struct emitted by cbindgen to avoid compiler warnings/errors about
/// return type C linkage for template types returned by value from functions
struct __cbindgen_return_value_monomorphs {
  ExternResult<ArrowFFIData*> field0;
  ExternResult<EngineBuilder*> field1;
  ExternResult<Handle<ExclusiveFileReadResultIterator>> field2;
  ExternResult<Handle<SharedExternEngine>> field3;
  ExternResult<Handle<SharedScan>> field4;
  ExternResult<Handle<SharedScanDataIterator>> field5;
  ExternResult<Handle<SharedSnapshot>> field6;
  ExternResult<KernelBoolSlice> field7;
  ExternResult<KernelRowIndexArray> field8;
  ExternResult<bool> field9;
  ExternResult<uintptr_t> field10;
};

... and the overall header compiles cleanly if I copy it to tmp.cpp (other than lacking a main function):

$ clang++ -Wall -Werror -Wno-pragma-once-outside-header -std=gnu++17 -DDEFINE_DEFAULT_ENGINE tmp.cpp
ld: Undefined symbols:
  _main, referenced from:
      <initial-undefines>
clang: error: linker command failed with exit code 1 (use -v to see invocation)
scovich commented 2 days ago

Ah, it wasn't respecting conditional compilation before... now it does:

/// Dummy struct emitted by cbindgen to avoid compiler warnings/errors about
/// return type C linkage for template types returned by value from functions
struct __cbindgen_return_value_monomorphs {
#if defined(DEFINE_DEFAULT_ENGINE)
  ExternResult<ArrowFFIData*> field0
#endif
  ;
#if defined(DEFINE_DEFAULT_ENGINE)
  ExternResult<EngineBuilder*> field1
#endif
  ;
  ExternResult<Handle<ExclusiveFileReadResultIterator>> field2;
#if defined(DEFINE_DEFAULT_ENGINE)
  ExternResult<Handle<SharedExternEngine>> field3
#endif
  ;
#if defined(DEFINE_SYNC_ENGINE)
  ExternResult<Handle<SharedExternEngine>> field4
#endif
  ;
  ExternResult<Handle<SharedScan>> field5;
  ExternResult<Handle<SharedScanDataIterator>> field6;
  ExternResult<Handle<SharedSnapshot>> field7;
  ExternResult<KernelBoolSlice> field8;
  ExternResult<KernelRowIndexArray> field9;
  ExternResult<bool> field10;
  ExternResult<uintptr_t> field11;
};