Closed itraviv closed 4 years ago
I like this idea. Do you have code that produces the output you included? If so please include it, possibly as a draft PR just for discussion.
I'd like you to list all the different items that should be dealt with, and how they will be dealt with. For example, timestamps are an easy case, because you just omit them. Memory addresses are a little harder, because you need to choose whether you'll switch the __repr__
method or post-process the output. Absolute paths are another thing where you'll need to decide what you're doing.
Is there any part of this that can't be done by post-processing the output? This is literally what's done in some of the tests.
Post-processing might work, but if you mean that we'll just leave the post-processing to the users, that means that many users are not gonna do it because they won't want to write that post-processing algorithm.
No but I think it makes more sense as a separate function than a flag.
Advantage of having a separate function: Avoids adding more complexity to our main algorithm.
Disadvantages of having a separate function:
So I still support adding this as a flag-- Assuming that Itamar can give good answers to my question above, and write a good implementation.
I'll have everything written (answers to your questions and implementation ideas) in 24h
normalize=True
use an empty string.__repr__
(if he explicitly wanted some kind of __repr__
I wouldn't override it). So basically now I am only overriding the default __repr__
with post-processing and cutting the memory address part (where at 0x... appears).normalize=True
will force thread_info=False
Everything will be documented as detailed as possible (I'll add detailed info to the docs).
Code snippets for discussion:
1.
now_string = pycompat.time_isoformat(now, timespec='microseconds') if not self.normalize else ''
2.
source_path = source_path if not self.normalize else os.path.basename(source_path)
3.
value_repr = utils.normalize_repr(value_repr)
Where normalize_repr is: 4.
def normalize_repr(item_repr):
parts = item_repr.partition(' at')
if parts[1]:
return parts[0] + '>'
return parts[0]
This is a naive implementation for truncating the default __repr__
at the 'at' word. i will gladly improve it.
In general - I tried to use the most efficient logic I could think of since I don't want to increase the runtime.
timestamps - if
normalize=True
use an empty string.
Agreed.
source path - use only the filename (use basename on the full path). thought about calculating some relative path (to some root point for example) but I don't know how to do it without extra knowledge and I don't want the user to give extra input and I found it useful enough this way.
Agreed.
repr - I think it'll be rude to override the user-defined
__repr__
(if he explicitly wanted some kind of__repr__
I wouldn't override it). So basically now I am only overriding the default__repr__
with post-processing and cutting the memory address part (where at 0x... appears).
Agreed.
- thread data - using
normalize=True
will forcethread_info=False
Hmm, instead, when normalize=True
and thread_info=True
raise a NotImplementedError
that explains that using these flags together isn't implemented.
- max variable length - I think we should force some value to always get the same output.
I disagree, people who are savvy enough to specify a different max_variable_length
should know that they need to set the same number on both runs.
Everything will be documented as detailed as possible (I'll add detailed info to the docs).
Code snippets for discussion:
now_string = pycompat.time_isoformat(now, timespec='microseconds') if not self.normalize else ''
source_path = source_path if not self.normalize else os.path.basename(source_path)
value_repr = utils.normalize_repr(value_repr)
Where normalize_repr is: 4.
def normalize_repr(item_repr): parts = item_repr.partition(' at') if parts[1]: return parts[0] + '>' return parts[0]
This is a naive implementation for truncating the default
__repr__
at the 'at' word. i will gladly improve it.In general - I tried to use the most efficient logic I could think of since I don't want to increase the runtime.
Looks good, don't forget adding tests, I'll wait for your PR.
Merged in #171
We came up with a simple but strong usage – diff between traces. We use it to find bugs or regressions between git-branches and code sections. However, we deal with removing all the machine-specific data (time stamps, reprs containing memory addresses, absolute paths, thread data etc.).
I thought about adding a flag called “normalize” that will do it. for example: the trace without changes (
normalize=False
):normalized trace (
normalize=True
):