dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.07k stars 1.56k forks source link

Default Map implementations take to too much storage. #16692

Open rakudrama opened 10 years ago

rakudrama commented 10 years ago

VM and dart2js implementations of LinkedHashMap seem to use an order of magnitude more storage than JavaScript.

-------- test.js: function main() {   var a = [];   while (true) {     a.push({'u': 1, 'v': 2, 'w': 3, 'x': 4, 'y': 5});     if (a.length % 100000 == 0) print(a.length);   } }

main();

-------- test.dart: main() {   var a = [];   while (true) {     a.add({'u': 1, 'v': 2, 'w': 3, 'x': 4, 'y': 5});     if (a.length % 100000 == 0) print(a.length);   } }


The highest numbers printed before an out-of-memory error by running this program on the 32 bit VMs are:

"sdk/bin/dart test.dart" 2200000 (2.2M)   "d8 test.js" 22100000 (22.1M) "d8 test.dart.js" 1000000 (1M)

"jsshell test.js" 41800000 (41.8M) "jsshell test.dart.js" 8400000 (8.4M)

From this it looks like Dart LinkedHashMaps take 5x-20x the storage of JavaScript Objects.

floitschG commented 10 years ago

This really isn't a fair comparison. One is allocating new objects, the other one is allocating hashtables. The JS object does not even contain the names anymore (which are stored in their map), they don't have excess memory (removed during GC or during move to old-space or during compacting. Don't remember).

We should be able to improve our hashtable implementation, but we will not be able to reach JS object performance.

lrhn commented 10 years ago

Do you know how big the heap can grow in d8? Using --trace-gc it seems to be topping out at ~800 MB total. I can see that the VM has a 512 Mb old space. It doesn't explain the difference (setting old-space to 756 MB only brings dart to 3.2M)

floitschG commented 10 years ago

The v8 version will bring the map size down to 5 pointers + header. (Maybe they need one more pointer).

The dart implementation needs: a header, a size field, a pointer to a list, a list that is ~ twice the needed size, containing both the field-name and the value. The map probably has even other fields. The list needs a header, a size field, ...

Again: I don't think this is a fair comparison

rakudrama commented 10 years ago

I think it is a fair comparison. It is the root cause of limitations to the size of JSON that can be processed in Dart. johnmccutchan@ and I could not process profiling data JSON dumped by the VM until he changed the format to use fewer maps.

Converting a string to JSON instead of a map literal (attached) doesn't change much other than d8/js is reduced:

"sdk/bin/dart bug6f8j.dart" 3400000 (3.4M) // --old_gen_heap_size=800   "d8 bug6f8j_js.js" 14800000 (14.8M) "d8 bug6f8j.dart.js" 1000000 (1M)

(Dropping jsshell - it is unchanged; the high number is most likely due to bigger heap)

The Dart VM seems to benefit from the single-character keys. If they are extended to three characters, the capacity drops to 2.6M for JSON (but no drop for object literals), presumably because the VM interns single character strings and JSON parser does not intern keys. d8 does not exhibit any difference for 3 char keys.

LinkedHashMaps take too much space and that impacts all code that uses them. Since LinkedHaspMap is the default representation, we would gain the most leverage improving LinkedHashMap rather than the numerous clients.


Attachments: bug6f8j.dart (329 Bytes) bug6f8j_js.js (191 Bytes)


cc @johnmccutchan.

kodandersson commented 9 years ago

The new implementation CompactLinkedHash{Map,Set} uses significantly less memory.


Set owner to @kodandersson. Added Fixed label.

rakudrama commented 9 years ago

Reopening because of dart2js.

PLAIN:

"sdk/bin/dart test.dart" 4500000 (4.5M) --old_gen_heap_size=800

V8 3.27.34.16 "d8 test.js" 22400000 (22.4M) (--trace-gc shows max of 849MB) "d8 test.dart.js" 1100000 (1.1M) (--trace-gc shows max of 717MB)

V8 4.4.0 "d8 test.js" 20300000 (20.3M) (--trace-gc shows max of 736MB) "d8 test.dart.js" 2200000 (2.2M) (--trace-gc shows max of 738MB)

"jsshell test.js" 44000000 (44.0M) "jsshell test.dart.js" 8800000 (8.8M)

JSON:

"sdk/bin/dart bug6f8j.dart" 3100000 (3.1M) // --old_gen_heap_size=800   V8 3.27.34.16 "d8 bug6f8j_js.js" 14900000 (14.9M) "d8 bug6f8j.dart.js" 8000000 (8M)

V8 4.4.0 "d8 bug6f8j_js.js" 14100000 (14.1M) "d8 bug6f8j.dart.js" 7000000 (7M)

[1] It will be interesting to see if ES6 Maps can help here.


Added Triaged label.

kodandersson commented 8 years ago

This issue is now specifically about dart2js, which uses its own map implementation.