dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.27k stars 1.58k forks source link

[vm/ffi] Remove workaround for type feedback missing in FFI trampolines #44454

Open dcharkes opened 3 years ago

dcharkes commented 3 years ago

Update 2021-01-05: The issue is that the JIT trampolines do not report type feedback that Struct. _addressOf contains TypedData on return values and arguments.

This could be addressed by making the VM know the layout of Struct and its subtypes. However, that makes the tree shaker no longer understand those types and their constructor calls. So, we might not want to make those types and their constructors opaque to the CFE&TFA.

We have a workaround, so this does not crash.

===============================================================================

In https://dart-review.googlesource.com/c/sdk/+/169221 we add nested structs and copying of nested structs.

Edit: the CQ also caught this: https://logs.chromium.org/logs/dart/buildbucket/cr-buildbucket.appspot.com/8861204090372525392/+/steps/test_results/0/logs/new_test_failures__logs_/0

Test succeeds:

  reset();
  print(typedDataBackedStruct.a1);
  typedDataBackedStruct.a1 = typedDataBackedStruct.a0;
  Expect.equals(5, typedDataBackedStruct.a1.a0);
  Expect.equals(6, typedDataBackedStruct.a1.a1);

Test fails:

  reset();
  typedDataBackedStruct.a1 = typedDataBackedStruct.a0; // This assignment does not work.
  Expect.equals(5, typedDataBackedStruct.a1.a0);
  Expect.equals(6, typedDataBackedStruct.a1.a1);
$ tools/build.py --no-start-goma -m debug -a x64 runtime ffi_test_functions dart_precompiled_runtime && tools/test.py -a x64 -c dartk,dartkp ffi_2
Done. Made 331 targets from 93 files in 400ms
ninja -C out/DebugX64 -j1000 -l64 runtime ffi_test_functions dart_precompiled_runtime
ninja: Entering directory `out/DebugX64'
ninja: no work to do.
ninja -C out/DebugX64 -j1000 -l64 runtime ffi_test_functions dart_precompiled_runtime done.
The build took 0.714 seconds
Warning: combination of compiler 'dartkp' and runtime 'vm' is invalid. Skipping this combination.
Warning: combination of compiler 'dartk' and runtime 'dart_precompiled' is invalid. Skipping this combination.
Test configurations:
    custom configuration(architecture: x64, compiler: dartk, mode: debug, runtime: vm, system: linux)
Suites tested: ffi_2
    custom configuration(architecture: x64, compiler: dartkp, mode: debug, runtime: dart_precompiled, system: linux)
Suites tested: ffi_2
[00:09 |  47% | +  213 | -    0]

FAILED: dartk-vm debug_x64 ffi_2/function_callbacks_structs_by_value_test
Expected: Pass
Actual: RuntimeError

--- Command "vm" (took 07.000348s):
DART_CONFIGURATION=DebugX64 out/DebugX64/dart --deterministic --optimization-counter-threshold=5 --use-slow-path --stacktrace-every=100 --ignore-unrecognized-flags --packages=/usr/local/google/home/dacoharkes/dart-sdk/sdk/.packages /usr/local/google/home/dacoharkes/dart-sdk/sdk/tests/ffi_2/function_callbacks_structs_by_value_test.dart

exit code:
255

stdout:
callbackPassStructRecurisive(10, (1, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(9, (2, 2, 3, 4, 5), 0x7fd435817000)
callbackPassStructRecurisive(8, (3, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(7, (4, 2, 3, 4, 5), 0x7fd435817000)
callbackPassStructRecurisive(6, (5, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(5, (6, 2, 3, 4, 5), 0x7fd435817000)
callbackPassStructRecurisive(4, (7, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(3, (8, 2, 3, 4, 5), 0x7fd435817000)
callbackPassStructRecurisive(2, (9, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(1, (10, 2, 3, 4, 5), 0x7fd435817000)
callbackPassStructRecurisive(0, (11, 2, 3, 4, 5))
returning
callbackPassStructRecurisive(11, (1, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(10, (2, 2, 3, 4, 5), 0x7fd435817000)
callbackPassStructRecurisive(9, (3, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(8, (4, 2, 3, 4, 5), 0x7fd435817000)
callbackPassStructRecurisive(7, (5, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(6, (6, 2, 3, 4, 5), 0x7fd435817000)
callbackPassStructRecurisive(5, (7, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(4, (8, 2, 3, 4, 5), 0x7fd435817000)
callbackPassStructRecurisive(3, (9, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(2, (10, 2, 3, 4, 5), 0x7fd435817000)
callbackPassStructRecurisive(1, (11, 2, 3, 4, 5))
PassStruct20BytesHomogeneousInt32x10(0, (12, 2, 3, 4, 5), 0x7fd435817000)
CallbackWithStruct(0x7fd43581700a)

stderr:
Unhandled exception:
Expect.equals(expected: <5>, actual: <-3085>) fails.
#0      Expect._fail (package:expect/expect.dart:702:5)
#1      Expect.equals (package:expect/expect.dart:132:5)
#2      testCopyLogic (file:///usr/local/google/home/dacoharkes/dart-sdk/sdk/tests/ffi_2/function_callbacks_structs_by_value_test.dart:130:10)
#3      main (file:///usr/local/google/home/dacoharkes/dart-sdk/sdk/tests/ffi_2/function_callbacks_structs_by_value_test.dart:21:5)
#4      _delayEntrypointInvocation.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:283:19)
#5      _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:184:12)

--- Re-run this test:
python tools/test.py -c dartk,dartkp ffi_2/function_callbacks_structs_by_value_test
[01:12 | 100% | +  447 | -    1]
dcharkes commented 3 years ago

When the test fails, a nested struct accessed on a larger struct which has a typed data as backing store, is returned as a struct with a pointer as backing store, rather than a typed data view. This happens because _addressOf is inferred to have the type Pointer (which it has not in this case, it's a TypedData). This might be speculative optimization.

==== file:///usr/local/google/home/dacoharkes/dart-sdk/sdk/tests/ffi_2/function_structs_by_value_generated_test.dart_Struct8BytesNestedInt_get_a0 (GetterFunction)
B0[graph]:0 {
      v0 <- Constant(#null)
      v1 <- Constant(#<optimized out>)
      v7 <- Constant(#Type: Pointer*) T{_Type}
      v10 <- Constant(#true) T{bool}
      v19 <- Constant(#_ImmutableList len:3) T{_ImmutableList}
      v26 <- Constant(#_ImmutableList len:3) T{_ImmutableList}
      v35 <- Constant(#TypeArguments: (H1d7e78bd) [Type: Struct4BytesHomogeneousInt16*]) T{TypeArguments}
}
B1[function entry]:2 {
      v2 <- Parameter(0) T{Struct8BytesNestedInt}
}
    CheckStackOverflow:8(stack=0, loop=0)
    v3 <- AllocateObject(Struct4BytesHomogeneousInt16) T{Struct4BytesHomogeneousInt16}
    v5 <- InstanceCall:10( get:_addressOf@8050071<0>, v2 IC[0: ]) T{Object?}
    v8 <- InstanceCall:12( _simpleInstanceOf@0150898<0>, v5, v7 IC[0: ]) T{*?}

[...]

After ApplyClassIds
==== file:///usr/local/google/home/dacoharkes/dart-sdk/sdk/tests/ffi_2/function_structs_by_value_generated_test.dart_Struct8BytesNestedInt_get_a0 (GetterFunction)
B0[graph]:0 {
      v0 <- Constant(#null)
      v1 <- Constant(#<optimized out>)
      v7 <- Constant(#Type: Pointer*) T{_Type}
      v10 <- Constant(#true) T{bool}
      v19 <- Constant(#_ImmutableList len:3) T{_ImmutableList}
      v26 <- Constant(#_ImmutableList len:3) T{_ImmutableList}
      v35 <- Constant(#TypeArguments: (H1d7e78bd) [Type: Struct4BytesHomogeneousInt16*]) T{TypeArguments}
}
B1[function entry]:2 {
      v2 <- Parameter(0) T{Struct8BytesNestedInt}
}
    CheckStackOverflow:8(stack=0, loop=0)
    v3 <- AllocateObject(Struct4BytesHomogeneousInt16) T{Struct4BytesHomogeneousInt16}
    CheckClass:10(v2 Cids[1: Struct8BytesNestedInt etc.  cid 1132])
    v5 <- LoadField(v2 . _addressOf@8050071 {final}) T{Pointer}

On failing runs with --trace-field-guards --optimization-counter-threshold=5

Store Field <Struct._addressOf@8050071>: final <?> <- Pointer<Struct20BytesHomogeneousInt32>: address=0x55b86aabf9c0
    => <not-nullable Pointer>

On succeeding runs with --trace-field-guards --optimization-counter-threshold=20

Store Field <Struct._addressOf@8050071>: final <?> <- Pointer<Struct20BytesHomogeneousInt32>: address=0x562806815980
    => <not-nullable Pointer>
Store Field <Struct._addressOf@8050071>: final <not-nullable Pointer> <- TypedDataView(cid: 104)
    => <*>

This looks like speculative optimization failing to deoptimize when assumptions no longer hold.

dcharkes commented 3 years ago

Working theory:

Fragment FlowGraphBuilder::WrapTypedDataBaseInStruct(
    const AbstractType& struct_type) {
  const auto& struct_sub_class = Class::ZoneHandle(Z, struct_type.type_class());
  struct_sub_class.EnsureIsFinalized(thread_);
  const auto& lib_ffi = Library::Handle(Z, Library::FfiLibrary());
  const auto& struct_class =
      Class::Handle(Z, lib_ffi.LookupClass(Symbols::Struct()));
  const auto& struct_addressof = Field::ZoneHandle(
      Z, struct_class.LookupInstanceFieldAllowPrivate(Symbols::_addressOf()));
  ASSERT(!struct_addressof.IsNull());

  Fragment body;
  LocalVariable* typed_data = MakeTemporary("typed_data_base");
  body += AllocateObject(TokenPosition::kNoSource, struct_sub_class, 0);
  body += LoadLocal(MakeTemporary("struct"));  // Duplicate Struct.
  body += LoadLocal(typed_data);
  body += StoreInstanceField(struct_addressof,
                             StoreInstanceFieldInstr::Kind::kInitializing);
  body += DropTempsPreserveTop(1);  // Drop TypedData.
  return body;
}

runtime/vm/compiler/frontend/kernel_to_il.cc

This allocates an object, stores a TypedDataView into it, without telling the JIT that that now Struct._addressOf can contain a TypedDataView. Which makes the JIT rely on that it has only seen Pointers in Struct._addressOf, which makes this test fail.

(Trying to add the guard fails, because FFI trampolines are force optimized.)

  body += StoreInstanceFieldGuarded(
      struct_addressof, StoreInstanceFieldInstr::Kind::kInitializing);
dcharkes commented 3 years ago

I'm unable to construct a case of this on master (without the CL):

It fails in the nested struct CL because we branch on Pointer<T extends NativeType> instead of on Pointer<T> in various places where the code branches on whether something is a Pointer or TypedData.

dcharkes commented 3 years ago

Because we have a workaround, this is not crashing.

mkustermann commented 3 years ago

As discussed in today's meeting we should separate allocation from sizeof calculation (similar to C code) for code size / tree shake reasons and performance reasons. We could use an API such as

// In "dart:ffi"
abstract class Allocator {
  Pointer<Void> allocate(int numBytes);
  void free(Pointer<Void> pointer);
}

// In "dart:ffi"
extern Pointer<T> allocate<T extends Struct>(Allocator allocator);

// In "package:ffi/ffi.dart"
class MallocAllocator extends Allocator { ... }
class ZoneAllocator extends Allocator { ... }

final malloc = MallocAllocator();

// User code:
void main() {
  Pointer<Foo> foop = allocate<Foo>(malloc);
  ...
  free<Foo>(foop, malloc);
}
// User code (lowered to kernel):
void main() {
  Pointer<Foo> foop = malloc.allocate(sizeOf<Foo>(malloc) /* <-- or rather it's lowered form */).cast<Foo>();
  ...
  malloc.free(foop.cast<Void>());
}

So we'll

dcharkes commented 9 months ago

fd2e9b9f1af9e2ced5d263956f82e4ef3b583c8a Made trampolines explicit in the CFE.

If we make the FFI trampoline actually return TypedData instead of T extends Struct and wrap the result in the struct constructor, both the TFA and JIT would know about the struct constructor invocations. We'd need to do this for both the closures and the @Native external functions.