Memory mapped files for parsing storage (proposal for comment)

eclipsewebmaster commented 6 months ago

| --- | --- | | Bugzilla Link | 572512 | | Status | NEW | | Importance | P3 normal | | Reported | Apr 01, 2021 01:18 EDT | | Modified | May 19, 2022 12:37 EDT | | Version | 1.11 | | See also | Gerrit change 180543, Git commit 039199c8, Gerrit change 180553, Git commit 542be78a | | Reporter | Jason Koch |

Description

Created attachment 286016\ mmap_index_writer_base.patch

Current parsing of files requires reading large arrays into heap storage, which provides a minimum bound on the footprint of Pass1/Pass2 parsing.

This patch moves the parsing to off-heap storage, specifically to a file-backed region. This allows parsing larger files within a significantly smaller heap footprint.

Observations:

There is a choice here between on-heap and off-heap. On-heap is convenient in that the JVM is clearly bounded, but the downside of course is that it is clearly bounded. Off-heap simplifies this, especially with file-backed, as it allows the OS to spill pages as-needed to disk and expand to handle vastly larger files with a given heap size.
Nice feature with off-heap is that sparse region/arrays do not take up any physical space. For example if an array is allocated that is N entries long but only 1/4 of those are used, a sparse file FS (most modern OS) will only allocate physical resources for what is used in the application. For anonymous regions, only the pages that are used will be mapped. This is a distinct difference to on-heap, where the memory is allocated/paged in even if it is not required.
There is also a choice between off-heap anonymous (using available RAM, spilled via swap), and off-heap named file (using available FS cache or paged to a file on file system). The current implementation picks a named file, but chooses Java's default /tmp. There is no guarantee this is fast or appropriate for the user's device.
Following comments on list, I assume that most users are using SSD which supports fast random storage. Spinning disk performance likely suffers if any spill is required -> in this case we should switch to anonymous off-heap which guarantees that storage is in memory.

Comments:

I haven't performed any speed-of-parsing performance testing, though I think the difference between "cannot parse due to not enough ram" vs "can parse" is easy, it is less obvious to me whether there will be a penalty in this model for heaps that fit in RAM. This warrants testing further if the approach looks sensible to others.
I have specifically written this so that, I think, it should be easy to swap in different implementations, such as off heap file, off heap anonymous, on-heap, etc, and stick to interface. It's possible we could provide this as an option to the user or even switch implementations dynamically?

:notepad_spiral: mmap_index_writer_base.patch

eclipsewebmaster commented 6 months ago

By Andrew Johnson on Apr 09, 2021 13:15

Interesting.\ Do we have some idea of how the phases compare in memory usage:\ Pass 1\ Pass 2\ Garbage collection\ Dominator tree - this can be quite memory intensive, but operates after unreachable objects have been discarded

eclipsewebmaster commented 6 months ago

By Jason Koch on Apr 10, 2021 13:30

Interesting. Do we have some idea of how the phases compare in memory usage:

In my experience this depends on the 'shape' of the heap in analysis. I'll add example calcs for 1billion object count, very rough analysis. The primary determinant of memory usage is the object count, and to a smaller extent the tree structure from GC roots. Much less important is the size of the hprof.

Pass 1

For 1 billion objects, 20GB during generation, falling to 8GB steady state.

Bulk of memory here is the 'Identifier' store which is a long[] at least as long as the count of objects. While generating, up to newCapacity is allocated as copy/scratch space (1.5x), so have to assume 2.5x8BN objects.

Pass 2

Multiple indexes generated here, assuming 1bn objects. Likely to use 16-28GB, plus scratch space; majority retained.

(1) object id -> outbound id references log [IntArray1NWriter, flush to PosIndexStreamer/IntIndexStreamer] (depends on shape of ref tree, int[] 4GB for header, byte[] 1GB for header2, plus IntIndexStreamer underneath, which uses int[page size] during generation, plus background compression. +12GB.\ (2) object id -> class id array [IntIndexCollector] (paged ArrayIntCompressed, compression based on max class id bits) int[], something less than 4GB\ (3) object id -> position in file index [LongIndexCollector] (paged ArrayLongCompressed, compression based on max class id bits), long[] something less than 8GB\ (4) object id -> array size index (for arrays only) [SizeIndexCollectorUncompressed] (int->int array), int[] much less than 4GB

Garbage collection

Approx 25GB here during processing. Most memory dropped after complete.

Over and above whatever is stored previously, for 1billion objects:\ boolean[] reachable (1B * object count) = 1GB, dropped after complete.\ IntArray1NSortedWriter for completed outbound storage = int[]4GB for header, byte[]1GB for header2. +12GB; dropped after complete.\ InboundWriter for collating the inbound references, int[]4GB for header, byte[]1GB for header2, long[]segmentSizes (<1GB), writes occur directly to file then are reprocessed. +12GB; dropped after complete.

Dominator tree - this can be quite memory intensive, but operates after unreachable objects have been discarded

Assuming a full 1bn heap, approx 28GB during processing, dropped after complete; but, up to 16GB of index is unload()ed prior to starting which mitigates this quite a bit.

a2size, o2address, o2class are unload()ed here, so effective requirement is somewhere less than 28GB, we can assume somewhere 8-16GB is saved before starting, depending on xxCompressed page effectiveness. Then, allocates 7x int[] for number of objects in the heap. For a 1bn object file, this is 7x4Bx1BN = 28GB. As you point out, if GarbageCleaner has reduced the footprint, DominatorTree requirements are scaled down respectively.

======

So, Identifier and Size certainly have an immediate impact.

If we think this strategy is sensible we can extend the memory mapped strategy to all index types and save significant amounts of memory during processing. I picked the two that were least coupled in the code to begin with, and that would have an impact, to demonstrate the basic idea.

Lots of the existing code relies on the class hierarchy and cross-references between reader and writer, so I wanted some validation of the idea before pursuing further.

There are additional benefits too if we take this further. For example, rather than reading a whole chunk of the file in SoftReference and unpacking it, the ArrayLongCompressed could be a wrapper/layer over the underlying store, and then the OS manages caching. This sort of thing I think can simplify the code quite a bit as we would no longer need to manage concurrent caching to achieve performance.

eclipsewebmaster commented 6 months ago

By Jason Koch on Apr 10, 2021 17:35

To show the impact I ran a quick test against a modest sized heap. It is a ~25% reduction in memory required to process the heap.

Sample heap:

20.800 MByte heap (20GB)
153 million objects
65million removed during GarbageCleaner

Xmx-Xms settings required to pass heap:

upstream/master 5400m OK, 5200m fails at GarbageCleaner createHistogramOfUnreachableObjects() with a hashmap resize. At this point, Identifier takes 50% of heap w 1.25GB (50%), histogram ArrayList at 520MB (20%), and IntIndexCollector at 510MB (19%). This fails before getting to DominatorTree processing.
origin/netflix 4000m OK, 3800m fails at DominatorTree. Inspecting dump shows 2GB of DominatorTree int[] arrays and SnapshotFactoryImpl.parse():purgedMapping taking 600MB [nb: I'll put up a new patch for this]. These two components take up 98.1% of the active heap.

From this it is possible to see the impact of Identifier & Size indexes moved to mmap/off-heap. Once other generated indexes had been unload()ed, the next biggest use of memory is DominatorTree.

eclipsewebmaster commented 6 months ago

By Andrew Johnson on Apr 28, 2021 09:07

A simple change for IndexWriter.Identifier might be to use the ArrayLongBig inside to collect identifiers with add.

API Compatible changes:\ On sort() it might have to append them to any existing entries, then reset the ArrayLongBig\ get() would need to go to the big array for the first entries (e.g. after a sort) then any current ArrayLongBig

That might limit the size needed to 2n rather than 2.5\ [Though if m entries were added after sorting, then it would need:\ n+m + m for the toArray, clear the ArrayLongBig for -m, then (n+m) for the new array so that is still 2(n+m) ]\ That is still in-memory though.

eclipsewebmaster commented 6 months ago

By Jason Koch on Apr 28, 2021 17:23

Yes, I looked initially at something like this, and it is viable. Specifically, you want to be able to release the heap allocation after the Identifier storage has been loaded. The biggest win would come from an API change or a sneaky trick. My thinking was, after a sort(), you could trigger a flush to disk and then reload the data from mmap'd region thus releasing it from the heap. Or, do it explicitly by exposing a flush in the API to convert it to a xxReader. However, as I built this out, it looked like going to a direct memory-mapped model made for much cleaner code, and allows a future path to simplify quite a bit of the storage and memory tier.

I did not yet pursue it, but I could run an experiment w/ swapping the DominatorTree to off-heap (should be simple to swap to use an IntArray instead of int[], etc) and measure how that impacts overall parsing memory requirements.

eclipsewebmaster commented 6 months ago

By Andrew Johnson on May 10, 2021 15:23

For Garbage Cleaner we have the IOne2LongIndex identifiers = idx.identifiers; index holding old object addresses.\ This is passed in from the parser, and for HPROF and DTFJ is an in-heap IndexWriter.Identifier object. I've recently changed IndexWriter.Identifier to accumulate in an ArrayLongBig, then it gets converted by the sort() to a long[], of the exact size needed.

It is kept around for the unreachable objects phase, then there is the map[] phase where the old to new mapping is built, and a new array of object addresses, index by the new object ID.\ long identifiers[oldN]\ boolean reachable[oldN]\ there is then this part\ int[] map = new int[oldNoOfObjects];\ long[] id2a = new long[newNoOfObjects];\ so you are right, the old identifiers with 8oldN bytes and new with 8newN bytes will use a lot of space.

We can off-load the identifier address with quite a small code change.\ We can write the identifiers to a temporary index file using LongIndexStreamer.writeTo at the begining of the clean.\ This takes a long[], ArrayLong or IteratorLong. We could extract a long[] from the IOne2LongIndex.getNext but that would take a lot of space.\ We can instead create a IteratorLong to read the supplied IOne2LongIndex and write it out with LongIndexStreamer.writeTo and get a new index. We can then close and delete the supplied index to free the long array.\ We could even close this index to free a little more space.\ We then do the GC, and build the map[]\ We then reopen the index file for the identifier addresses.\ We then have another iteratorLong stepping through the map[] skipping unused entries and retrieving the object address from the newly created temporary index and writing them out to the final index.

This should save quite a bit of space at the expense of another file write and read, but with the memory saved the SoftReference caches should work better too.

eclipsewebmaster commented 6 months ago

By Andrew Johnson on May 11, 2021 04:55

Similarly, we could store the old object to class index to disk before the garbage clean.

Example OutOfMemoryError when parsing a dump with 63 million live objects and presumably 63,169,600 total objects.\ Failure allocating map[]

at org.eclipse.mat.parser.internal.GarbageCleaner.clean(Lorg/eclipse/mat/parser/internal/PreliminaryIndexImpl;Lorg/eclipse/mat/parser/internal/SnapshotImplBuilder;Ljava/util/Map;Lorg/eclipse/mat/util/IProgressListener;)[I (GarbageCleaner.java:149)

Object / Stack Frame |Name| Shallow Heap | Retained Heap\ ------------------------------------------------------------------------------------------------------------------\

org.eclipse.mat.parser.internal.PreliminaryIndexImpl @ 0x826fc308 | | 48 | 545,256\ org.eclipse.mat.parser.internal.SnapshotImplBuilder @ 0x82abf540 | | 40 | 40\ java.util.HashMap @ 0x826fc370 | | 48 | 48\ org.eclipse.mat.ui.util.ProgressMonitorWrapper @ 0x826fc338 | | 32 | 32\ org.eclipse.mat.parser.index.IndexManager @ 0x82abf568 | | 48 | 48\ boolean[63169578] @ 0xaa000000 | | 63,169,600 | 63,169,600\ int[3703] @ 0x82abf780 | | 14,832 | 14,832\ org.eclipse.mat.parser.index.IndexWriter$Identifier @ 0x82cce638 | | 24 | 505,356,664\ org.eclipse.mat.parser.index.IndexReader$IntIndex1NReader @ 0x82d7e2d0 | | 32 | 24,112\ org.eclipse.mat.parser.index.IndexWriter$IntIndexCollector @ 0x812f7a60| | 48 | 205,306,888\ org.eclipse.mat.collect.HashMapIntObject @ 0x82d1a800 | | 40 | 828,072\ java.util.HashMap @ 0x82f02800 | | 48 | 4,231,568\ ------------------------------------------------------------------------------------------------------------------ The IntIndexCollector for object to class takes approximately log2(oldN)*oldN / 8 bytes as an in-memory version.

eclipsewebmaster commented 6 months ago

By Andrew Johnson on May 11, 2021 09:11

Created attachment 286362 Free some space during garbage collection.

The parsers can supply the object to address and object to class indexes as in-memory versions. This frees the memory by converting them to disk-backed versions.

:notepad_spiral: GarbageCleaner-writeTo.patch

eclipsewebmaster commented 6 months ago

By Jason Koch on May 11, 2021 09:12

Created ~~attachment 286363~~ (attachment deleted)\ dominator_tree.patch

Sample patch to support using disk-backed storage during DominatorTree construction.

eclipsewebmaster commented 6 months ago

By Jason Koch on May 11, 2021 09:15

Yes, there are a few ways to reduce memory footprint ongoing. For ex the attached dominator tree patch provides reduction of 5x int[] from the DominatorTree construction - for a 2bn object heap, this works out to a saving of 40GB of required heap space.

eclipsewebmaster commented 6 months ago

By Andrew Johnson on May 11, 2021 09:19

I've added a simple patch to free some space during garbage collection. Please give it a try on a large dump to see if it helps.\ I think it could reduce usage at that point by 8nOld (for the identifier index) + 8nOld (for the long[] array) + log2(nOld)*nOld/4 for the object to classId compressed index so up to 19.75Gbyte for 1,000,000,000 objects, though if those indexes were used by GC then some memory would be in the soft reference caches.

eclipsewebmaster commented 6 months ago

May 12, 2021 11:58

New Gerrit change created: https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/180543

eclipsewebmaster commented 6 months ago

May 12, 2021 12:01

Gerrit change https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/180543 was merged to [master].\ Commit: http://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=039199c8beb7ad28761ad988399a4a97fb995966

eclipsewebmaster commented 6 months ago

May 12, 2021 15:53

New Gerrit change created: https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/180553

eclipsewebmaster commented 6 months ago

May 12, 2021 15:56

Gerrit change https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/180553 was merged to [master].\ Commit: http://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=542be78a8001cc5eb106bc2e26bd42a950843695

eclipsewebmaster commented 6 months ago

By Andrew Johnson on May 13, 2021 03:09

I've merged some changes freeing up some space for object id to address and object id to class id during GC.\ It turned out that the DTFJ parser was already converting these indexes into disk versions if they were big enough and memory was short, so I added it always in HPROF, and added a conversion process for in-memory index Identifier or IntIndexCollector.\ I didn't bother unloading the disk indexes as if there is enough memory it is good to use any entries already there, and the soft references will allow them to be freed if memory is short.

eclipsewebmaster commented 6 months ago

By Andrew Johnson on Jun 02, 2021 09:31

Comment on ~~attachment 286363~~ (attachment deleted)\ dominator_tree.patch

Please do not merge this patch.

eclipsewebmaster commented 6 months ago

By Andrew Johnson on Jun 02, 2021 09:31

Please don't take this as a vote against this idea, but I'm going to mark attachment 286363 as obsolete as it looks like includes code which is not original so can't be submitted under the Eclipse Contributor Agreement https://www.eclipse.org/legal/ECA.php as EPL code to the project. If it's obsolete it's a reminder not to merge this.

eclipsewebmaster commented 6 months ago

By Jason Koch on Jun 02, 2021 12:09

+1 agree with your position to obsolete the dom-tree patch

eclipsewebmaster commented 6 months ago

By Jason Koch on Jun 02, 2021 13:16

Created attachment 286508 2_memory_mapped_storage_into_tmpdir.patch

Fix to place files into java tmp location instead of current working directory.

:notepad_spiral: file_572512.txt

eclipsewebmaster commented 6 months ago

By Jason Koch on Jun 02, 2021 13:18

Created attachment 286509 3_explicit_check_for_negative_index

Explicitly check for negative indexes in case of greater than 2bn indexes (overflow)

:notepad_spiral: file_572512.txt

eclipsewebmaster commented 6 months ago

By Jason Koch on Jun 02, 2021 13:21

Created attachment 286510 4_introduce_basic_heap_backed_arrays.patch

Also, this basic heap-backed implementation could smooth any transition / migration work.

:notepad_spiral: file_572512.txt

eclipsewebmaster commented 6 months ago

By Andrew Johnson on May 19, 2022 04:07

See bug 571331 comment 4 for details of the Identifier change which reduces the memory usage to 16nOld in generation, 8nOld in use, so for 1 billion objects, 16GB during generation, falling to 8GB steady state.

DTFJIndexBuilder shows how the write out the Identifier to a temporary index after pass 1. The reader then uses the standard MAT reader so uses SoftReference caching. We should do this for HPROF. We need a fix to bug 579931 to make reverse() thread safe.

object id -> array size index is actually 4*nOld in size\ Although this index is usually sparse some dumps can have more arrays than plain objects, so a HashMap will not always save space.

(3) object id -> position in file index [LongIndexCollector] (paged ArrayLongCompressed, compression based on max class id bits), long[] something less than 8GB

The compression here is based on the maximum position in file, so for 1,000,000,000 objects and perhaps a 80GB file this should be less than 5GB.

eclipsewebmaster commented 6 months ago

By Jason Koch on May 19, 2022 12:37

In my view, putting the arrays behind an abstraction of some sort is the primary benefit. We have two challenges, in my view, to supporting large heaps:

2bn object limit, which will forever be a constraint if we only use int-based-indexing
runtime size for objects

By building a small abstraction over int[] and long[] we can then use our own long-based indexing and grow/shrink the backing store is needed.

Even if the backing store is "backed by an array", then we introduce flexibility to the design, such that users can even make the tradeoff themselves, or even go so far as to offer it as an osgi/eclipse plugin, etc.

eclipse-mat / mat

Memory mapped files for parsing storage (proposal for comment) #33

Description