Alternative implementation

christian-marie commented 10 years ago

Hi, this is what I meant by using cons. Arguably less readable, but slightly better garbage collection output:

Entirely up to you which one you select, this one avoids the HashMap.map reverse.

Before:

~/tmp/bench$ time ./burst +RTS -s
100000
      62,300,968 bytes allocated in the heap
      14,002,224 bytes copied during GC
       2,275,680 bytes maximum residency (6 sample(s))
         731,568 bytes maximum slop
               6 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       115 colls,     0 par    0.01s    0.01s     0.0001s    0.0005s
  Gen  1         6 colls,     0 par    0.00s    0.00s     0.0007s    0.0019s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.02s  (  0.02s elapsed)
  GC      time    0.01s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.03s  (  0.03s elapsed)

  %GC     time      38.9%  (39.9% elapsed)

  Alloc rate    3,804,130,501 bytes per MUT second

  Productivity  60.8% of total user, 62.5% of total elapsed

real    0m0.030s
user    0m0.020s
sys 0m0.007s

After:

~/tmp/bench$ time ./burst +RTS -s
100000
      61,180,944 bytes allocated in the heap
      11,086,088 bytes copied during GC
       1,798,504 bytes maximum residency (6 sample(s))
         247,664 bytes maximum slop
               5 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       112 colls,     0 par    0.01s    0.01s     0.0001s    0.0003s
  Gen  1         6 colls,     0 par    0.00s    0.00s     0.0008s    0.0018s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.02s  (  0.02s elapsed)
  GC      time    0.01s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.03s  (  0.03s elapsed)

  %GC     time      33.7%  (34.7% elapsed)

  Alloc rate    2,945,317,731 bytes per MUT second

  Productivity  66.0% of total user, 68.2% of total elapsed

real    0m0.034s
user    0m0.033s
sys 0m0.007s

istathar commented 10 years ago

Hey. @christian-marie and I just had a long chat about this. It's not entirely clear whether or not you are preserving the order or the reverse order. The lambda does do a better job than (++) as shown above, no doubt, but we've managed to talk ourselves into confusion about what your original goal was. If I understand correctly, you're trying to preserve the wire order of the repeated elements, right? Have you got a test that establishes that?

AfC

christian-marie commented 10 years ago

To be clear, this does not preserve ordering without a reverse at the end.

If you wish to append to the end of a list efficiently, you will need to use something like Data.Sequence and you get to use the crazy little hat operators: (|>)

NathanHowell commented 10 years ago

I'll take a look tonight. I'm pretty sure the RepeatedField tests verify wire order.. at least that was the intention. On Jan 28, 2014 6:18 PM, "Andrew Cowie" notifications@github.com wrote:

Hey. @christian-marie https://github.com/christian-marie and I just had a long chat about this. It's not entirely clear whether or not you are preserving the order or the reverse order. The lambda does do a better job than (++) as shown above, no doubt, but we've managed to talk ourselves into confusion about what your original goal was. If I understand correctly, you're trying to preserve the wire order of the repeated elements, right? Have you got a test that establishes that?

AfC

Reply to this email directly or view it on GitHubhttps://github.com/alphaHeavy/protobuf/pull/6#issuecomment-33550408 .

NathanHowell commented 10 years ago

But it also seems like this pull request build should have failed, it's possible the test coverage here is lacking...

NathanHowell commented 10 years ago

Looks like that test was only partially implemented. I finished the implementation in d32824baffc9295c0b12fa2d6c28a637202763c3 and now this pull request fails to pass tests, as expected. Can you revise it with the HashMap.reverse and see how the GC times compare to the baseline?

christian-marie commented 10 years ago

I can tomorrow. Yes. In fourteen hours or so.

christian-marie commented 10 years ago

With the reverse back in it's still marginally better perhaps?

~/tmp/bench$ time ./burst +RTS -s
100000
      61,661,000 bytes allocated in the heap
      12,005,216 bytes copied during GC
       1,865,968 bytes maximum residency (6 sample(s))
         325,400 bytes maximum slop
               5 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0       113 colls,     0 par    0.01s    0.01s     0.0000s    0.0004s
  Gen  1         6 colls,     0 par    0.00s    0.00s     0.0006s    0.0015s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    0.02s  (  0.02s elapsed)
  GC      time    0.01s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    0.03s  (  0.03s elapsed)

  %GC     time      34.8%  (35.9% elapsed)

  Alloc rate    3,891,409,105 bytes per MUT second

  Productivity  64.9% of total user, 67.0% of total elapsed

real    0m0.027s
user    0m0.020s
sys 0m0.003s

alphaHeavy / protobuf

Alternative implementation #6