ARM-software / trappy

This repository has moved to https://gitlab.arm.com/tooling/trappy
Apache License 2.0
60 stars 39 forks source link

trappy: Add support for 'print:' ftrace fallback event #276

Closed bjackman closed 5 years ago

bjackman commented 6 years ago

This event is just like tracing_markwrite but presents a small difficulty in that its name is a Python keyword. This is solved by appending '' to the atribute name of the FTrace object.

bjackman commented 6 years ago

Agh, this is totally broken, I must have been running tests wrong somehow.

derkling commented 6 years ago

First question: do we have trace events named "print" ?!?!

Is there any way to wrap the standard Python's print in a class and then dispatch it properly depending on whatever the use is calling a function (i.e. trace.print()) or an attribute (i.e. trace.print.somethingelse).

Or even better, I cannot think about a usage of trace.print()... thus perhaps we can just override the definition of print?

bjackman commented 6 years ago

First question: do we have trace events named "print" ?!?!

Yep, I've been seeing them on hikey960, they seem to come from where tracing_mark_write usually comes from. I had a look in the kernel code but I decided now wasn't the time to finally learn how tf ftrace works :tongue: so I'm not exactly sure where they come from.

Is there any way to wrap the standard Python's print in a class and then dispatch it properly depending on whatever the use is calling a function (i.e. trace.print()) or an attribute (i.e. trace.print.somethingelse).

Nope, there's no getting around a keyword - trace.print.data_frame is a syntax error. PEP8 recommends the trailing underscore approach. (IIUC this is fixed in Python 3).

Anyway... I came up against a pretty gnarly bug and there still seem to be more.. so let's see if I can get this fixed soon.

bjackman commented 6 years ago

Welp.. that was fun :grin: I think I finally fixed it.

valschneider commented 6 years ago

Trying to make sure I understand this well enough - please correct me if I'm wrong on any point:

When a trace is created with trace = trappy.trace(...), each entry present in class_definitions will lead to a setattr making each underlying dataframe accessible by using trace.<class_definitions key>.

Then there's the whole biz with how trace classes are actually imported. This is done in:

# [2] minus thermal/sched/etc class_definitions.update() done later
self.class_definitions.update(self.dynamic_classes.items())
# [3]
self.__add_events(listify(events))

class_definitions.update() will update class_definitions with keys created from trace class name fields, whereas __add_events() will update class_definitions with keys created from trace class unique_word fields. If that is true (i.e. if I understood things correctly), this means that there is a bug here that has lived comfortably for quite some time but seemingly never bothered us: class_definitions can have two different keys pointing to two different instances of the same trace class. In the end, only the entry pointed by the unique_word key will be used and have its Dataframe populated. I hope I'm right, it took me way too much time and too many open browser tabs to write this part.

To me we should've only used the unique_word field to index these classes, but then we get events like "print" that cause Python syntax woes... And on top of that, indexing Dataframes by unique_word instead of name would be a sudden change, that would've had to be done a while ago. So here I think it's mostly a matter of making sure the golden standard used everywhere within the code is indexing by the name field.

valschneider commented 6 years ago

Hang on, I'm getting a bit confused. According to that:

def __parse_trace_file(self, trace_file):
        """parse the trace and create a pandas DataFrame"""

        # Memoize the unique words to speed up parsing the trace file
        cls_for_unique_word = {}
        for trace_name in self.class_definitions.iterkeys():
            trace_class = getattr(self, trace_name)
            if trace_class.cached:
                continue

            unique_word = trace_class.unique_word
            cls_for_unique_word[unique_word] = trace_class

        if len(cls_for_unique_word) == 0:
            return
        ...

We actually go through every single instance available in class_definitions, BUT we can find two instances of the same trace_class. This means that cls_for_unique_word (which is used during parsing to dump actual trace data in the right place) will point to ONE of these two instances, but with dict key iteration being what it is we aren't really guaranteed which version (created during [2] or [3]) we're going to populate. HOWEVER if this is really true I'm surprised we haven't hit this sooner... Am I just completely way off ?

bjackman commented 6 years ago

Your first comment matches my understanding exactly.

In your second comment:

We actually go through every single instance available in class_definitions, BUT we can find two instances of the same trace_class. This means that cls_for_unique_word (which is used during parsing to dump actual trace data in the right place) will point to ONE of these two instances, but with dict key iteration being what it is we aren't really guaranteed which version (created during [2] or [3]) we're going to populate. HOWEVER if this is really true I'm surprised we haven't hit this sooner... Am I just completely way off ?

Yes, this is exactly right, however (as is only just now becoming fully clear to me), we only actually hit the bug when

The only places where I have seen this happen are

So (replying to your inline comment here)...

Is this change done to make SchedContribScaleF's name field the same as its unique_word field ?

Not quite. Before this PR, if you have 'sched_contrib_scale_f' in your events param then you will have two SchedContribScaleFactor instances, one at trace.sched_contrib_scale_f and one at trace.sched_contrib_scale_factor, and it's undefined which one will be populated. In practice it seems (because the TestLegacySystrace test was passing) that sched_contrib_scale_f would be populated. With this PR one of those keys had to go away - by "default" it would be trace.sched_contrib_scale_f, my intention here was that the trace.sched_contrib_scale_factor should go away instead.

However, on reflection, this should be the other way around - we should change the behaviour for the more specific case (where you specified 'sched_contrib_scale_f' in your events param) rather than the more general one. I will change this, and I will investigate adding a warning for when you try to access the wrong attribute.

There are quite a few classes with different name and unique_word - at least most if not all of the thermal classes do (see temperature or devfreq cooling stuff).

I will look into these too.

Thanks a lot for the careful review, this is turning out to be a gnarly one!

bjackman commented 6 years ago

Well I've added some more sanity checks and fixed another subtle bug which we have probably never encountered.

My main concern now is whether this breaks any LISA code..

valschneider commented 6 years ago

With this PR one of those keys had to go away - by "default" it would be trace.sched_contrib_scale_f, my intention here was that the trace.sched_contrib_scale_factor should go away instead.

However, on reflection, this should be the other way around - we should change the behaviour for the more specific case (where you specified 'sched_contrib_scale_f' in your events param) rather than the more general one. I will change this, and I will investigate adding a warning for when you try to access the wrong attribute.

Keeping the dataframes named after trace_class.name makes the most sense to me, which is why I was a bit confused as to why you were changing the SchedContribScaleFactor class name.

bjackman commented 6 years ago

Keeping the dataframes named after trace_class.name makes the most sense to me, which is why I was a bit confused as to why you were changing the SchedContribScaleFactor class name.

I think I disagree with you on this: using the name of a trace parser does feel semantically "better" but it also can't be determined except by fiddling with TRAPpy - I think having our own name for an event where it isn't necessary (e.g. sched_contrib_scale_factor instead of sched_contrib_scale_f) violates the principle of least surprise.

bjackman commented 6 years ago

OK on further discussion and examination of existing TRAPpy APIs I changed my mind. So to summarize:

e.g. for the sched_contrib_scale_f the name is sched_contrib_scale_factor and the unique_word is sched_contrib_scale_f:. If you construct a trace with events=['sched_contrib_scale_f'] it will parse it for you, but then if you try to access trace.sched_contrib_scale_f you will get an error telling you to access trace.sched_contrib_scale_factor instead.

I think this is probably a reasonable balance of semantic cleanliness and usability. Otherwise you'd have to request the print event with events=['print_'] which is a bit ugly.

We should avoid having events with different name from unique_word wherever possible, because I don't think this can ever not be confusing.

So.. I'm gona go ahead with this as is..

douglas-raillard-arm commented 6 years ago

New PR based on that one: https://github.com/ARM-software/trappy/pull/280

JaviMerino commented 5 years ago

Superseeded by #280, merged in 61d032065ebb