Mikolaj / ghc

Mirror of ghc repository. DO NOT SUBMIT PULL REQUESTS HERE
http://www.haskell.org/ghc/
Other
1 stars 0 forks source link

Design and implement heap fragmentation for intervals in TS #12

Closed Mikolaj closed 12 years ago

Mikolaj commented 12 years ago

Perhaps before that do the cheap thing and just try to mimick +RTS -s fragmentation for the whole runtime and show only that figure regardless of the interval.

Here is the most recent discussion about fragmentation for intervals: 13:00 @dcoutts mikolaj: ah yes, so I was thinking of changing the static heap info event too 13:00 @dcoutts adding the MBlock size 13:01 @dcoutts so we can interpret the fragmentation in a more helpful way 13:01 @dcoutts since it's only if frag is higher than an MBlock that we care 13:02 @dcoutts and it probably makes sense to round down to an MBlock once it's over an MBlock too 13:09 @mikolaj dcoutts: GHC compiled OK on my side 13:09 @mikolaj dcoutts: you mean the one-time heap event? 13:09 @mikolaj makes sense 13:10 @mikolaj dcoutts: go ahead, I haven't tested nor pushed the ghc-events changes yet, so I'll just need to recompile GHC 13:11 @dcoutts mikolaj: alternatively could just round down to an MBlock in the GC_STATS event itself 13:11 @dcoutts mikolaj: I'm not sure if the extra detail is ever useful 13:12 @dcoutts the MBlock size in the one-time heap event is probably sensible anyway 13:12 @mikolaj dcoutts: np, though so far all the events seem to be raw; perhaps it's useful to keep it that way? 13:13 @dcoutts mikolaj: right that was my initial intuition 13:13 @mikolaj dcoutts: no manipulation, just passing through (converting ints, thats' all) 13:13 @dcoutts on the other hand, if it needs extra info to interpret 13:13 @dcoutts and the detail has no use at all 13:14 @mikolaj fair enough, having single events as self-contained as possible is useful too 13:14 @mikolaj perhaps do it on the GHC side, then? 13:14 @dcoutts mikolaj: I guess the extra detail means we can give a range on the fragmentation 13:14 @dcoutts that's about as far as I can see it goes 13:15 @dcoutts e.g. if you have 1.5Mb frag, then the "interesting" fragmentation is in the range 1-1.5 13:15 * mikolaj scratches his head 13:17 @mikolaj dcoutts: you mean, it can be, that it's really 1.5MB fragmentation, lots of empty bits at many blocks, not half a single block empty, right? 13:17 @dcoutts mikolaj: right 13:17 @mikolaj ok, got it now 13:18 @mikolaj then we are really losing info if we truncate 13:18 @dcoutts hmm 13:18 * dcoutts thinks again 13:18 @mikolaj even if most of the time it's accurate 13:19 @mikolaj ok, bad wording, we didn't have full info in the first place 13:19 @mikolaj but we mangle it additionally by truncating 13:20 @mikolaj instead of reporting noisy fragmention we report exact lower bound on fragmentation 13:20 * dcoutts is now not so sure 13:31 @dcoutts mikolaj: so suppose we had a 10 MB heap and 2.5 Mb of fragmentation 13:31 @dcoutts it's not obvious to me what the upper and lower bounds on the "real" fragmentation is 13:32 @dcoutts mikolaj: oh, hmm, another issue with this is that we're reporting "current" fragmentation 13:33 @dcoutts which isn't really fragmentation, can just be that we've collected lots of garbage, and now have spare MBlocks 13:33 @dcoutts mikolaj: this is why GHC only looks at the peak MBlock and blocks allocated 13:34 @dcoutts ghc will eventually free MBlocks back to the OS, but not immediately 13:36 @dcoutts mm, perhaps I'll just keep the raw frag detail and add MBlock size to static heap info event 13:36 @dcoutts and we can work it out later :-) 13:36 @mikolaj yes 13:36 @mikolaj dcoutts: I think it's 2.0--2.5 13:37 @dcoutts you may be right, but I don't feel confident about it :-) 13:37 @mikolaj dcoutts: agreed about "current", so we are actually reporting "unused allocated memory" or something 13:38 * mikolaj is confused about "peak" 13:38 @dcoutts mikolaj: right, it's simply the difference between allocated blocks and the MBlocks 13:38 @mikolaj dcoutts: perhaps name it so 13:38 @dcoutts mikolaj: peak, the max over the program run 13:40 @mikolaj dcoutts: but our events do not show the peak, though we can compute it 13:40 @dcoutts mikolaj: right, we can compute it, that's just what the RTS does, keeps a running maximum 13:40 @dcoutts the peak is interesting because we know that it's not a moment where the memory used is declining 13:41 @dcoutts so don't have the issue that we could release MBlocks back to the OS, but just haven't done it yet 13:41 @mikolaj if so, then the figure is doubly not fragmentation: does not take the last block possibility into account and is not peak 13:41 @dcoutts right 13:41 @mikolaj I'd definitely rename it and not bother; it's raw, it's exact, there can be other uses 13:41 @dcoutts it's free space 13:41 @dcoutts which may or may not be usable for new allocations 13:42 @dcoutts due to fragmentation 13:45 @mikolaj so in TS I will compute the peak, fetch the block size and report it for an interval "peak fragmentation 2.0--2.5" 13:45 @dcoutts mikolaj: it's worse than that I think, 13:46 @dcoutts not just any local peak will do 13:46 @dcoutts certainly a global peak is ok 13:47 @dcoutts a peak since releasing MBlocks to the OS is also ok 13:47 @mikolaj so for too small intervals it could be wrong? only then? 13:48 @dcoutts mikolaj: actually for most intervals it'll be wrong :-) 13:48 @dcoutts there's only a few times when we can estimate it accurately 13:48 @dcoutts we have to be pushing up against the ceiling of MBlocks allocated from the OS 13:49 @mikolaj ok, I understand now 13:50 @dcoutts imagine inflating a balloon inside an expandable box 13:50 @mikolaj yup, got it, and the box does not have a baloon shape 13:52 @mikolaj so I'll report for an interval: "peak unused allocated heap 2.53", or "fragmentation at most 2.53", because it's still a correct upper bound, isn't it? 14:01 @dcoutts mikolaj: ignore it for now I think 14:01 @dcoutts frag isn't a big issue for users really 14:02 @mikolaj dcoutts: ok, see you after lunch 14:35 @dcoutts mikolaj: I think, if we care about frag at all, that showing only the global peak is ok 14:35 @dcoutts not worth the effort now to do the extra work for other peaks 14:45 @dcoutts mikolaj: indeed slop and frag are the lowest priority of the stats I think 14:47 @mikolaj dcoutts: global peak for intervals is OK, since I can precompute stuff for the max interval once and keep it somewhere, but when we process partial or very big eventlogs, it's not so easy any more 14:48 @mikolaj "fragmentation at most 2.53" is probably not good, because the figure can be huge for some intervals and then it's frightening for the user 14:49 @mikolaj I may do "peak unused allocated heap 2.53" for intervals and change it on the fly to "fragmentation at most 2.53" when I detect it's the max interval 14:49 @dcoutts mikolaj: yes, at that point we'd have to do it properly 14:49 @mikolaj or I can just skip it for now; what do you think? 14:49 @dcoutts mikolaj: honestly I don't think it's worth the development effort at the moment 14:50 @dcoutts there's lots of more useful stats 14:50 @mikolaj k, so skip it or show global fragmentation? 14:50 @dcoutts mikolaj: skip it, we can think about global frag when we're done with the useful ones 14:50 @mikolaj there are probably other global stats there, too, like max_something or peak_something, which are computed in GHC, so they are not per-interval 14:51 @mikolaj dcoutts: k, I will skip it

Mikolaj commented 12 years ago

16:07 @dcoutts mikolaj: I think we don't need #12 actually 16:07 @mikolaj dcoutts: OK, I'll remove it from the display then and close the issue

It's much less work to display fragmentation for the whole program run, but then we'd have to mark it specially, because all other figures are for the selected interval. Also, when/if we display partial eventlogs or eventlogs of running programs, we can't show global fragmentation any more.