dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.11k stars 1.57k forks source link

Add RandomAccessFile.readMapped(Sync) to dart:io library #49810

Open biggs0125 opened 2 years ago

biggs0125 commented 2 years ago

Proposal (draft change)

Add 2 methods, readMapped and readMappedSync, to the RandomAccessFile interface in the dart:io SDK package. These methods would expose the functionality of memory mapping files into the virtual address space (mmap). They would function similarly to the read/readSync methods but would avoid the allocation of a new Uint8List.

For simplicity we would mark the address space as “read protected”. This would be enforced within the Dart world by returning unmodifiable views of the Uint8Lists from the new methods. Here is the proposed file.dart API:

/// Loads the file into a memory mapped [Uint8List].
///
/// The resulting [Uint8List] is externally allocated and memory mapped into
/// the heap virtual address space. The allocated virtual memory space will
/// be marked as 'readonly' and so the Uint8List will be unmodifiable.
Future<Uint8List> readMapped();

/// Synchronously loads the file into a memory mapped [Uint8List].
///
/// The resulting [Uint8List] is externally allocated and memory mapped into
/// the heap virtual address space. The allocated virtual memory space will
/// be marked as 'readonly' and so the Uint8List will be unmodifiable.
///
/// Throws a [FileSystemException] if the operation fails.
Uint8List readMappedSync();

Goal

Reduce memory usage in Dart2JS running in sequential mode. In sequential mode Dart2JS serializes the results of its first 4 phases (kernel, closed world, global inference, codegen) after computing each of them. Then at the start of each phase, the results for all the previous phases are deserialized and loaded into memory. By the final phase we the serialized bytes for every phase of the compiler stored in memory. For large applications, this is a significant amount of memory stored in these byte arrays.

Justification

In some large applications we see hundreds of MBs saved in every phase of the Dart2JS compiler using this strategy. This peaks at around 700MB of memory saved in the final phase for these large applications. This also unblocks some further optimizations that can be made to preserve memory.

Breaking change overview

This change to the RandomAccessFile interface would be a breaking change as implementations of the interface would need to implement these new methods.

It is unlikely there are many implementers of this interface beyond the dart:io package itself. One known implementer is package:file which has 2 places that would need to be updated with these methods. MemoryRandomAccessFile and ForwardingRandomAccessFile would each have simple implementations of these methods: draft change

Note: The implementations in package:file are all marked as @override with a lint ignore. dart:io won't have the interface change yet. By landing that change first and setting the min SDK version of package:file to >= 2.19.0-dev we can avoid ever actually breaking package:file. These implementations will be valid overrides once dart:io is updated but before then they are just methods on those subclasses. The lint ignores will keep the analyzer from flagging these methods.

mraleph commented 2 years ago

Any specific reason you don't want to use FFI for this? With Finalizer and NativeFinalizer available you don't really need this directly implemented in dart:io. You might need a small native component (if you want to ensure prompt synchronous finalisation), but there is no problem in building such (especially internally).

We have previously discussed this with dart2js team and provided the code for achieving this without any changes to the VM, e.g. older version of the code https://gist.github.com/mraleph/d8ffb32cd419e08e15b8fc245dbbeff7. I would probably rewrite it today to use Finalizer and NativeFinalizer depending on few factors (e.g. if you know that some blob is alive for the whole duration of the execution you don't need a finalizer at all - alternatively if you know precise liveness of some mapped region then you probably want to manually unmap it).

biggs0125 commented 2 years ago

Other have tried adding a dependency on FFI into Dart2JS before. The dependency broke some infrastructure and blocked the SDK from building. There wasn't an easy workaround or fix for the issue unfortunately.

mraleph commented 2 years ago

I think the main (the only?) issue was that dart2js appjit snapshot is trained on dart2js itself. One option to resolve the issue is just to use conditional imports, another is to just use AOT instead of appjit

sigmundch commented 2 years ago

We already have environments where we use an AOT snapshot for dart2js internally, so that should be quite viable.

Also, it may be easy to provide a separate dart2js entrypoint for training data.

Our entrypoint is already layered, so it wont require much effort. In particular, in the past we didn't allow importing dart:io either, so the compiler definition in dart2js.dart instantiates a IO layer (our SourceFileProvider API) that uses dart:io, but then calls the compiler implementation (from lib/src/compiler.dart), which is then agnostic of this source file provider implementation. See https://github.com/dart-lang/sdk/blob/85dee9748c85fe3d3e470c630650a8c17e291c63/pkg/compiler/lib/src/dart2js.dart#L731

For unit tests we don't even use the main dart2js entrypoint, but a "memory compiler", that provides an in-memory SourceFileProvider implementation instead, see: https://github.com/dart-lang/sdk/blob/85dee9748c85fe3d3e470c630650a8c17e291c63/pkg/compiler/test/helpers/memory_compiler.dart#L158

We could apply a similar strategy here to hide the dependency to dart:ffi, and use something like the memory compiler as the input for the appjit training.

sigmundch commented 2 years ago

@mraleph @a-siva @rakudrama @brianquinlan

What are your thoughts on the general direction to take here. Assuming we can address the blocking issues with using ffi (e.g. we address the bootstrapping issue using a different entrypoint for the appjit training), what is your preference in the design space? Would you rather expose this as a public API (I presume we are not the only ones that may be interested in using mmap), or use ffi and keep the library internal to just dart2js for now?

rakudrama commented 2 years ago

@mraleph @a-siva @rakudrama @brianquinlan

What are your thoughts on the general direction to take here. Assuming we can address the blocking issues with using ffi (e.g. we address the bootstrapping issue using a different entrypoint for the appjit training), what is your preference in the design space? Would you rather expose this as a public API (I presume we are not the only ones that may be interested in using mmap), or use ffi and keep the library internal to just dart2js for now?

My 2¢:

The original thread started by @davidmorgan in March 2021 (internal email) wanted to use mapped files for DDC and Analyzer ​too since that would reduce total memory for the persistent versions of those tools. So this is broader than dart2js (but I am not sure where or how those tools read binary data).

We are in a better position with either route than we were last year.

​I would prefer we add a proper API to provide a read-only memory-mapped view of a file as a Uint8List (implemented as UnmodifiableUint8ListView, which I understand just recently became as efficient as plain Uint8List for reads.)

If we do go the route of using ffi, we should call the VM's OS-abstraction layer otherwise the memory-mapping won't work on some operating systems. It is not clear that we can call File::Map as easily as mmap(2).

Prompt finalization might be important for reading codegen shards - dart2js reads these large binary files sequentially and drops them so GC can collect them. I don't really want to complicate the dart2js build with a platform-dependent native component to run a copy of the code that already mostly exists in the VM.

a-siva commented 2 years ago

@mraleph @a-siva @rakudrama @brianquinlan

What are your thoughts on the general direction to take here. Assuming we can address the blocking issues with using ffi (e.g. we address the bootstrapping issue using a different entrypoint for the appjit training), what is your preference in the design space? Would you rather expose this as a public API (I presume we are not the only ones that may be interested in using mmap), or use ffi and keep the library internal to just dart2js for now?

I would prefer adding it to dart:io as this seems like a useful feature that others would want too. We recently did a breaking change to the File Create API and landed it, this breaking change may also be of a similar scale and manageable.

mraleph commented 2 years ago

I'd prefer a route that does not increase dart:io API surface area. Additionally there are far too many decision points with this API.

With this API being in dart:io we will have to make decisions and stick to those. We loose a lot of flexibility.

With dart:ffi route users are free to decide all of these themselves.

If we do go the route of using ffi, we should call the VM's OS-abstraction layer otherwise the memory-mapping won't work on some operating systems. It is not clear that we can call File::Map as easily as mmap(2).

Do you really need to support this on other operating systems though? The case you are trying to improve is Linux centric.

a-siva commented 2 years ago

True the API to support needs some fine tuning. The pattern this proposed API change is trying to introduce is a pretty useful pattern not just for this use case but potentially other cases too and hence I feel it would be useful to introduce in dart:io

With regards to loss of flexibility, we would cover a big chunk of potential use case with a well defined API, I am sure there would be some outliers with some very specific cases and those could be handled through a custom FFI implementation.

biggs0125 commented 2 years ago

Do you really need to support this on other operating systems though? The case you are trying to improve is Linux centric.

The immediate issue I'm trying to solve is Linux centric, yes. But this would be an improvement across all platforms if we could support them with this change. I would suspect DDC and especially the analyzer stand to gain even more from this being supported across all platforms.

mraleph commented 2 years ago

The pattern this proposed API change is trying to introduce is a pretty useful pattern not just for this use case but potentially other cases too and hence I feel it would be useful to introduce in dart:io

In my opinion any usage of this API requires good understanding of associated issues, tradeoffs and platform specific behaviours, e.g. on POSIX you can mmap and then unlink the underlying file, themmap-ing will remain valid and unlink will succeed, on Windows MapViewOfFile will lock the file - so you can't DeleteFile it. This is especially problematic if we go forward with API that does not allow explicit unmap-ing and instead relies on a finalizer. This makes it impossible on Windows to write code that reads some file through memory mapping and then deletes it - because you can't force release the memory mapping. On POSIX you can hit the limit on memory mappings if you are not careful (GC is not aggressive enough) - there is a question about GC heuristics as well (should the size of the file be included into the heap size? if it is not included then GCs might not happen fast enough - but if it is included then you have too many GCs).

On POSIX when you use mmap and read from the mapping you are gonna receive a SIGBUS whenever underlying file system has an issue (e.g. read fails due to networking error for NFS or underlying file is truncated by concurrent process or media was ejected/unmounted). You need to be ready for this as a developer.

I feel fairly strongly that we should not include this into the dart:io. dart:ffi was explicitly added to allow access to such functionality directly from Dart for expert developers.

This also goes against our desire to shrink dart:io surface.

jakemac53 commented 2 years ago

Is this something we could reasonably expose through a published package? Are we in a place where we have a reasonable story for publishing the native bits for each platform with a single package? Do we have the infrastructure internally to support this as well?

If yes to all of the above then I think it would be reasonable to just do it as a separate package, also less breaking to the ecosystem (not at all) and it would be easier to evolve the api in the future.

mraleph commented 2 years ago

Is this something we could reasonably expose through a published package?

Yes.

Are we in a place where we have a reasonable story for publishing the native bits for each platform with a single package? Do we have the infrastructure internally to support this as well?

The package does not need actually need any native bits. You can write all necessary code in pure Dart.

The only native bit the package likely needs is one C-function that encodes native finaliser (if we decide that we want GC driven synchronous finalisation of these mappings): but that can be achieved by embedding the small necessary piece of native code directly into the Dart code (see https://gist.github.com/mraleph/d8ffb32cd419e08e15b8fc245dbbeff7#file-test-dart-L119-L143 which illustrates how it is done for X64, similar approach can be ported to work on ARM/ARM64 if we wish to support those systems).

itsjustkevin commented 2 years ago

@biggs0125 is this breaking change still in the works and ready for the approval process?

biggs0125 commented 2 years ago

I've put this on hold for now. It seems likely we won't go with this approach but I'd like to revisit it once we have a bit more time on the Dart2JS team. I've removed the 'breaking-change-request' tag for now.

a-siva commented 1 year ago

After discussion the general consensus was that this functionality would first be added as a package and after the package has considerable usage and adjustments to the API are done we would reevaluate to see if it makes sense to move this functionality into dart:io

shamblett commented 1 year ago

I'v recently come across another use case for memory mapped file support, the new language models such as Llama, Falcon etc. are quite large, a quantized Falcon 40B model file for instance is 24GB in size. These have to be loaded and parsed by your instuction/chat app.

I'm currently doing this using the stdlibc pub package. This works fine, however, going forward it may be better to bake this functionality into dart:io rather than depend on pub packages.

isoos commented 3 days ago

@mraleph: Would it be possible to introduce this feature in a new Dart SDK library? IIRC Java has io and nio and the later includes the memory-mapped file support, possibly because they had similar concerns of keeping the io surface small. I think simple embedded databases would love to use this feature, but it seems like a big hassle to include ffi in the package.

mraleph commented 1 day ago

I really don't see any reason why this should not just a be a package. It is more aligned with our vision for the future of Dart then expanding dart:io.

sigmundch commented 1 day ago

For reference, after our discussions and consensus above, we created a internal package for this in the Dart SDK that we now use within dart2js. The API ended up being pretty natural and worked well for our purposes.

You can find the current implementation at package:mmap. We didn't publish this in pub because we only did what was needed to meet our needs in dart2js at the moment (for us, that was only linux support, which is the only environment where the multi-stage compilation mode of dart2js is in use). For this to be a well supported package, it needs to be generalized to support for other OSs