Open jam1garner opened 3 years ago
My current observations show that an array of your size, if applied to a 523'417'318 bytes file, results in a memory increase of ~512 Mb. This means that we've got ~64 bytes per u8
now.
This issue is marked stale as it has been open for 11 months without activity. Please try the latest ImHex version. (Avaiable here: https://imhex.download/ for release and https://imhex.download/#nightly for development version) If the issue persists on the latest version, please make a comment on this issue again
Without response, this issue will be closed in one month.
Minimum Reproduction
On my system this hits roughly 4 GiB of memory, doing a larger size (example: 0x10000000) will cause too much memory pressure and triggering oomkiller.
Further thoughts
It seems like this means approximately 512 bytes of memory per
u8
(8 MiB of data on disk becomes 4 GiB in memory). From some further experimentation, this scales roughly linearly (2 MiB became ~1GiB).(Link to previous Discord discussion)
Investigating Root Cause
PatternDataArray
allocates anstd::vector<PatternData*>
which will result in:(All sizes tested for compiler
g++-10 (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
on x64)Possible Solutions/Mitigations
PatternDataArrayEntry
andPatternDataBase
, with the goal of splittingPatternData
into two classes:PatternDataBase
, the minimum features (i.e. everything that's applicable for both array entries and normal variables) andPatternData
(which includes everything a normal variable would need that an array doesn't need).PatternDataArrayEntry
is a thin subclass ofPatternDataBase
that would pretty much only be used for overriding virtual methods with array-entry-specific behavior. See graph below about what a potential field split would look like.This would maybe also allow for reducing memory fragmentation of arrays by removing the level of indirection in the vector (i.e.
std::vector<PatternData*>
becomesstd::vector<PatternDataArrayEntry>
(notice: entries stored inline in the vector, not a separate allocation each))Edit: some mistakes in the above graph--m_color should be in
PatternData
, m_endian proably(?) shouldn't (as I would imagine arrays have homogenous endianesses (following similar thinking to why m_typeName should be removed from it--arrays have homogenous types))Testing above solution
After testing the above to see what the size differences would look like (not a functional implementation, just observing how struct changes would affect memory layout), this would result in a lower bound of 7-10x reduction in size per entry, with an upper bound of ~20x per entry. However some form of sparse representation for arrays (i.e. entries are calculated on-the-fly from per-array data, rather than per-entry data being held at all times) would result in greater improvements.