dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.19k stars 1.56k forks source link

dart2js retains large buffers after loading binary files in final stage #44478

Open rakudrama opened 3 years ago

rakudrama commented 3 years ago

This is the final state of modular dart2js with two code shards. The files are m.dill, m.dill.data, m.code0 and m.code1. The buffers are 1.7GB out of a total heap of ~10GB. This is just before printing sizes.

Class CompilerSourceFileProvider
Shallow size 48B
Reachable size...↺
Retained size...↺
Retaining path{  ⊞  }
Inbound references{  ⊞  }
fields (5){  ⊟  
  bool isWindows false
  Uri cwd_Uri {  ⊞  }
  Map<Uri, Input<dynamic>> utf8SourceFiles _InternalLinkedHashMap (1) {  ⊞  }
  Map<Uri, Input<dynamic>> binarySourceFiles _InternalLinkedHashMap (4) {  ⊟  
    [ _Uri {  ⊞  } ] : Binary {  ⊟  
      final Uri uri = _Uri {  ⊞  }
      final List<int> data = _Uint8List (776370600) {  ⊞  }
    }
    [ _SimpleUri {  ⊞  } ] : Binary {  ⊟  
      final Uri uri = _SimpleUri {  ⊞  }
      final List<int> data = _Uint8List (314383771) {  ⊞  }
    }
    [ _SimpleUri {  ⊞  } ] : Binary {  ⊟  
      final Uri uri = _SimpleUri {  ⊞  }
      final List<int> data = _Uint8List (300354678) {  ⊞  }
    }
    [ _SimpleUri {  ⊞  } ] : Binary {  ⊟  
      final Uri uri = _SimpleUri {  ⊞  }
      final List<int> data = _Uint8List (305224933) {  ⊞  }
    }
  }
  int dartCharactersRead 1696348718
}

/cc @johnniwinther I wonder if the Kernel deserialization could be made to work on a stream or list of blocks, so consumed bytes can be discarded, and lazy-parsed regions block-copied to smaller buffers.

johnniwinther commented 3 years ago

cc @jensjoha

jensjoha commented 3 years ago

I think I'm misunderstanding something (or confused or something)...

From what I see in the above (and my looking at the source), dart2js has a class called CompilerSourceFileProvider (pkg/compiler/lib/src/source_file_provider.dart) which is a SourceFileProvider (same file) which has an explicit map Map<Uri, api.Input> binarySourceFiles where api.Input is from pkg/compiler/lib/compiler_new.dart with a field T get data; which (from the above) is the _Uint8List with lots of data int it. All of it actively retained by dart2js itself and having nothing to do with kernel or the front_end..?

Anyway, to be more concrete on the questions:

rakudrama commented 3 years ago

Yes, dart2js needs to release this structure, but doing so will not help because there are other references to the buffer. This is why I am adding the front-end to the issue.

This is the back-end of dart2js, where we do not need very much of the Kernel representation (but it is unclear exactly what because there is lazy access in dart2js too that falls back on lazy access to the .dill). We load too much of it, but not the bodies of resolution-time tree-shaken methods, which is a large proportion of the methods. I see a lot of closures also retaining the 770MB buffer.

What I meant by a 'stream or list of blocks': If the input was segmented in some way (a list of blocks), could a block be removed from the list (i.e. list entry nulled-out) when converted to data? For example, the source file byte buffers are in memory twice - as a sequence of the .dill bytes and their own bytes. If the file was segmented into blocks, these could be moved to a list of blocks for the source, somewhat like transferring the ownership of blocks completely within the region. Blocks could be fixed-size, or the .dill could contain a directory of fortuitous breaks that would allow subsequences of large strings and source data to be moved without fragmentation.

I'm open to other approaches that reduce either the heap size and/or address space size (both are constraints on our build).

rakudrama commented 3 years ago

I instrumented readStringTable for my example 770MB .dill file, and it shows that the string table is nearly 200MB. It would be great if the string table could be freed after conversion to strings.

StringTable 572078172..764765399 = 192687227