dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.06k stars 1.55k forks source link

[vm/ffi] Optimize asTypedList() #39843

Closed dcharkes closed 2 years ago

dcharkes commented 4 years ago

_asExternalTypedData is implemented as a runtime entry, we should make this a recognized method.

Samples: 27K of event 'cycles:uppp', Event count (approx.): 22172477433
  Overhead  Command     Shared Object                  Symbol
-   13.77%  DartWorker  dart                           [.] dart::BootstrapNatives::DN_Ffi_asExternalTypedData                                                ◆
   - 10.70% dart::BootstrapNatives::DN_Ffi_asExternalTypedData                                                                                               ▒
      - 10.48% dart::NativeEntry::BootstrapNativeCallWrapper                                                                                                 ▒
           CallBootstrapNative                                                                                                                               ▒
         + dart.ffi_::__asExternalTypedData@7050071                                                                                                          ▒
   + 1.05% dart::Type::type_class_id                                                                                                                         ▒
   + 0.64% dart::Smi::AsInt64Value                                                                                                                           ▒
+    6.52%  DartWorker  dart                           [.] dart::ExternalTypedData::New                                                                      ▒
+    6.01%  DartWorker  dart                           [.] dart::VMHandles::AllocateHandle                                                                   ▒
+    4.57%  DartWorker  perf-124944.map                [.] *dart.typed_data___ExternalUint64Array&_TypedList&_IntListMixin&_TypedIntListMixin@6027147_setRang▒
+    4.08%  DartWorker  dart                           [.] dart::Object::Allocate                                                                            ▒
+    3.45%  DartWorker  perf-124944.map                [.] *Lz4Lib_decompressFrame                                                                           ▒
vaind commented 3 years ago

Any chance this is going to be addressed? We have had to do various workarounds in pub.dev/packages/objectbox to at least alleviate the issue but there's really no way to completely get around it AFAIK.

And it really is rather slow at about 7.5 million asTypedList() calls per second, taking up a good portion of DB read operations (getting about 5 million per second on the same machine so you get the idea how much of that is asTypedList). See also https://github.com/objectbox/objectbox-dart/blob/main/benchmark/bin/native_pointers.dart

dcharkes commented 3 years ago

Your benchmark results are consistent with the ones in the SDK:

https://github.com/dart-lang/sdk/blob/f820f7ce48a95bd4bc4f3007f811e4be798ff4cf/benchmarks/FfiMemory/dart/FfiMemory.dart#L242-L296

I can't make any statements on prioritization. Currently, we're busy working on other dart:ffi tasks.

Until we get to this, would it be possible to use the Pointer [] and []= operations instead?

vaind commented 3 years ago

Until we get to this, would it be possible to use the Pointer [] and []= operations instead?

We're using it to get a ByteData.view() - any chance there's an implementation working with Pointer<Uint8>?

vaind commented 3 years ago

I've tried a rough, simple (and probably incorrect) dart implementation of ByteData, with its overrides implemented using byte shifts and indexing a Pointer<>, e.g.

  @override
  int getUint32(int byteOffset, [Endian endian = Endian.big]) {
    assert(endian == Endian.little);
    return (_ptr[byteOffset + 3] << 24) |
        (_ptr[byteOffset + 2] << 16) |
        (_ptr[byteOffset + 1] << 8) |
        _ptr[byteOffset];
  }

but it's even slower than using asTypedList for our use case (in our DB - reads go from those 5M/sec to about 1.3M/sec).

blaugold commented 2 years ago

Any updates on this? I would really appreciate this optimization getting implemented.

It would be quite impactful, for many users of dart:ffi, since it currently dominates Utf8Pointer.toDartString and other methods in package:ffi: Screenshot from 2021-11-24 12-19-39

vaind commented 2 years ago

@dcharkes I'm happy to contribute this change if you can point me in the right direction, e.g. what you meant by "make this a recognized method."

dcharkes commented 2 years ago

@dcharkes I'm happy to contribute this change if you can point me in the right direction, e.g. what you meant by "make this a recognized method."

That is a fairly complicated process. It involves writing our intermediate representation graph in the compiler. Here is the same process for making Pointer loads and stores recognized methods.

Furthermore, that intermediate representation then generates machine code, which if it is wrong will lead to segfaults which are good fun debugging.

Unless you're already familiar with (1) building the Dart VM, (2) Dart VM's intermediate representation, and (3) debugging segfaults with GDB/RR, I would not recommend it.

vaind commented 2 years ago

Thanks @dcharkes, the linked MR has convinced me this is not the right time for me to get involved here :)

blaugold commented 2 years ago

I've submitted a PR (#47780) to resolve this issue.

dcharkes commented 2 years ago

I've submitted a PR (#47780) to resolve this issue.

You made my day! I left a review in Gerrit: https://dart-review.googlesource.com/c/sdk/+/221360.

Upon running the tests I saw 1 test started failing. Let me know if you need help debugging it. (The test doesn't fail on main.) cc @sstrickl who is fluent in the type testing stubs and type argument vectors.

blaugold commented 2 years ago

Glad I can help. 😃 I'll try to address the requested changes later today.

blaugold commented 2 years ago

@dcharkes I haven't been able to debug that failing test yet, but it seems to be related to the number of added recognized methods (10). With 8 or less new recognized methods, the TTS test passes. The TTS test fails because it is unable to set up a partial TTS with a TypeTestCache because the TTS respecializes instead. I don't know enough about how those things interact. This line causes the issue.

Update The issue that makes the test fail seems to be unrelated to the PR, it just happens to be triggered by the code change. Instructions::Equals contains a bug because it potentially compares uninitialized memory. The Instructions header is padded to align the following instructions on word boundaries. The overall size of Instructions is further padded to kObjectAlignment. By using memcmp with the full object size, memory with random content is included in the comparison.

Update Memory for Instructions actually is properly initialized in Object::InitializeObject... In any case, in the failing test, Instructions::Equals returns false even though the instructions are the same.

Update Instructions::Equals compares the full memory of two instances, including the header of UntaggedObject. In this case, one instance has been marked by the GC while the other has not.

sstrickl commented 2 years ago

Sorry about the delay in responding, was OOO last week. Thanks for digging further, @blaugold, that sounds right and I can make a quick fix CL if so :)

sstrickl commented 2 years ago

Done in https://dart-review.googlesource.com/c/sdk/+/221467, up for review now.

sstrickl commented 2 years ago

CL has landed, so you should be clear now. Manual running of the tests succeeded on some of the failing build configurations, so if you're still seeing failures in the other CL once rebased, I'd be interested in knowing.

blaugold commented 2 years ago

Thanks for the quick fix @sstrickl.