ARM-software / trappy

This repository has moved to https://gitlab.arm.com/tooling/trappy
Apache License 2.0
60 stars 39 forks source link

trappy: Speed up trappy by caching trace parsing #247

Closed joelagnel closed 7 years ago

joelagnel commented 7 years ago

Pandas is extremely fast at parsing csv to data frames. Astonishingly it takes < 1s to serialize/deserialize a 100MB work of traces with 430000 events to/from csv. We leverage this and write out a data frames into a csv file when they are created for the first time. Next time we read it out if it exists. To make sure, the cache isn't stale, we take the md5sum of the trace file and also ensure all CSVs exist before reading from the cache. I get a speed up of 16s to 1s when parsing a 100MB trace.

Signed-off-by: Joel Fernandes joelaf@google.com

joelagnel commented 7 years ago

CC @bjackman

joelagnel commented 7 years ago

All great suggestions. Will fix up.

About pickle, I still prefer csv because it's easier to investigate the cache in clear text and also is more immune to the internal format of pandas objects than binary object representation I think. Also I am also not sure how much faster we can get because we are already so fast with csv.

joelagnel commented 7 years ago

Also from this link it looks like csv is faster than pickle: http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization

joelagnel commented 7 years ago

So this is the new version with all review comments addressed.

joelagnel commented 7 years ago

Hopefully this one is merge-ready

bjackman commented 7 years ago

I pushed an extra test, along with some microscopic fixes for stuff I noticed while writing it, to the branch on my repo here: https://github.com/bjackman/trappy/tree/for-trappy-cache

So I guess feel free to cherry-pick/squash that into your branch? Or you can leave it and I'll submit it separately once this is merged (I'm ill this week and cannot think clearly at all so the code may be nonsense).

joelagnel commented 7 years ago

Squashed @bjackman's fixups except his new test case (which IMO can be added after this set is merged, had to drop it because the test case itself had a bug). Pushed a new version. @JaviMerino @bjackman does it look good to you?

bjackman commented 7 years ago

This is good from my point of view, let's see what others think.

bjackman commented 7 years ago

I updated my branch (https://github.com/bjackman/trappy/tree/for-trappy-cache) with the missing file for the test I added. I also added another test for something I was confused about, and an extra check in _check_trace_cache, to avoid a warning if the set of parsed events has changed since the cache was written.

Also, AFAICT this currently breaks the normalize_time and window/abs_window params. Probably the simplest solution is just to add them to the md5sum or something like that?

edit: Does it also break the events parameter? I think this case is actually broken (as @credp suggested):

  1. Parse trace with events=['foo'], cache is created but all .csv files except the one for "foo" are empty)
  2. Parse trace again with events=['bar'], the DataFrame for "bar" is empty.

Maybe it's safest to just store & check the whole parameter list for __init__

joelagnel commented 7 years ago

CC @sinkap

joelagnel commented 7 years ago

Updated with @bjackman 's latest commits. Thanks! Will work on the other corner cases now (new event passed after cache creation). @bjackman while I do that could fix the normalize time issue you mentioned if you had time? I am new to that code so I could use some help there. Thanks!

joelagnel commented 7 years ago

@bjackman so I dropped "ftrace: Add explicit check for presence of each .csv file" and added a new commit which reads the cache partially. Your new test case for dynamic events added after caching passes with that.

bjackman commented 7 years ago

I updated my branch (https://github.com/bjackman/trappy/tree/for-trappy-cache) with

sinkap commented 7 years ago

@bjackman I will review your branch today. Since it seems to have some more fixes. @joelagnel

joelagnel commented 7 years ago

@bjackman I think you didn't update your branch (last update was June 8 to https://github.com/bjackman/trappy/tree/for-trappy-cache and I already picked up those fixes). Also could you rebase on top of master when you push again? Thanks!

bjackman commented 7 years ago

Eek.. you're right I didn't push, and I can't find the commits on my desktop... I will have to check when I get to my laptop :fearful:

sinkap commented 7 years ago

@bjackman feel free to ping when you have the updated commits.

joelagnel commented 7 years ago

@bjackman not having caching is killing me, can't wait to have this goodness in with your fixes ;D

credp commented 7 years ago

I think @joelagnel and @bjackman addressed Javi's change request, so I am going to dismiss that review to remove the big red cross :+1:

joelagnel commented 7 years ago

thanks @credp

credp commented 7 years ago

So where are we with this PR? Looks just about ready pending a push from @bjackman? (cc @joelagnel )

sinkap commented 7 years ago

Yep. we have a sync meeting today. Let's finalize it there.

On Jun 26, 2017 10:27 AM, "Chris Redpath" notifications@github.com wrote:

So where are we with this PR? Looks just about ready pending a push from @bjackman https://github.com/bjackman? (cc @joelagnel https://github.com/joelagnel )

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ARM-software/trappy/pull/247#issuecomment-310995499, or mute the thread https://github.com/notifications/unsubscribe-auth/ACDZbOCEx0OqPp5gwPr5WoH2es9551sQks5sH2t-gaJpZM4Na7y5 .

bjackman commented 7 years ago

Hey guys, really sorry for causing delays here, I forgot about it then went on holiday. Will update tonight when I get home (where the lost commits are on my computer).

sinkap commented 7 years ago

@bjackman Enjoy your holiday! :smile:

I am merging #253. It would be best to rebase this :)

bjackman commented 7 years ago

OK, I've pushed the commits now: https://github.com/bjackman/trappy/tree/for-trappy-cache

Forgotten the details so I'll do a self-review tomorrow :)

joelagnel commented 7 years ago

@bjackman, also could you rebase on top of @credp 's changes which got merged? :) thanks man.

On Tue, Jun 27, 2017 at 11:21 AM, Brendan Jackman notifications@github.com wrote:

OK, I've pushed the commits now: https://github.com/bjackman/ trappy/tree/for-trappy-cache

Forgotten the details so I'll do a self-review tomorrow :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ARM-software/trappy/pull/247#issuecomment-311442899, or mute the thread https://github.com/notifications/unsubscribe-auth/AACSVD3ZYGhZgSDOTZTSPAjoBmTPZgKbks5sIUgxgaJpZM4Na7y5 .

bjackman commented 7 years ago

OK I updated the branch. The rebase was pretty awkward so I think the patch stack is messed up, but TBH I think we should squash them all anyway instead of having "add X", "fix X" in a single PR.

joelagnel commented 7 years ago

Thanks @bjackman , squashed as many as that made sense to squash. HAPpy SNAPpy TRAPpy!

sinkap commented 7 years ago

Merged!