Add a flag to normalize the output (remove machine-specific data)

itraviv commented 4 years ago

We came up with a simple but strong usage – diff between traces. We use it to find bugs or regressions between git-branches and code sections. However, we deal with removing all the machine-specific data (time stamps, reprs containing memory addresses, absolute paths, thread data etc.).

I thought about adding a flag called “normalize” that will do it. for example: the trace without changes (normalize=False):

Source path:... C:\[my-absolute-path]\PySnooper\tests\test_pysnooper.py
Starting var:.. A = <class 'tests.test_pysnooper.test_see_output.<locals>.A'>
10:02:07.262743 call      1555     def stam():
10:02:07.263744 line      1556         a = A(19)
New var:....... a = <tests.test_pysnooper.test_see_output.<locals>.A object at 0x0000023BEB538710>
10:02:07.263744 line      1557         b = A(22)
New var:....... b = <tests.test_pysnooper.test_see_output.<locals>.A object at 0x0000023BEB538518>
10:02:07.263744 line      1558         res = a.a + b.a
New var:....... res = 41
10:02:07.263744 line      1559         return res
10:02:07.263744 return    1559         return res
Return value:.. 41

normalized trace (normalize=True):

Source path:... test_pysnooper.py
Starting var:.. A = <class 'tests.test_pysnooper.test_see_output.<locals>.A'>
 call      1555     def stam():
 line      1556         a = A(19)
New var:....... a = <tests.test_pysnooper.test_see_output.<locals>.A object>
 line      1557         b = A(22)
Modified var:.. a = <tests.test_pysnooper.test_see_output.<locals>.A object>
New var:....... b = <tests.test_pysnooper.test_see_output.<locals>.A object>
 line      1558         res = a.a + b.a
Modified var:.. a = <tests.test_pysnooper.test_see_output.<locals>.A object>
Modified var:.. b = <tests.test_pysnooper.test_see_output.<locals>.A object>
New var:....... res = 41
 line      1559         return res
Modified var:.. a = <tests.test_pysnooper.test_see_output.<locals>.A object>
Modified var:.. b = <tests.test_pysnooper.test_see_output.<locals>.A object>
 return    1559         return res
Return value:.. 41

cool-RR commented 4 years ago

I like this idea. Do you have code that produces the output you included? If so please include it, possibly as a draft PR just for discussion.

I'd like you to list all the different items that should be dealt with, and how they will be dealt with. For example, timestamps are an easy case, because you just omit them. Memory addresses are a little harder, because you need to choose whether you'll switch the __repr__ method or post-process the output. Absolute paths are another thing where you'll need to decide what you're doing.

alexmojaki commented 4 years ago

Is there any part of this that can't be done by post-processing the output? This is literally what's done in some of the tests.

cool-RR commented 4 years ago

Post-processing might work, but if you mean that we'll just leave the post-processing to the users, that means that many users are not gonna do it because they won't want to write that post-processing algorithm.

alexmojaki commented 4 years ago

No but I think it makes more sense as a separate function than a flag.

cool-RR commented 4 years ago

Advantage of having a separate function: Avoids adding more complexity to our main algorithm.

Disadvantages of having a separate function:

Post-processing logic is more difficult, might not work for all cases and testing might not be able to reveal that.
More difficult for users to discover.

So I still support adding this as a flag-- Assuming that Itamar can give good answers to my question above, and write a good implementation.

itraviv commented 4 years ago

I'll have everything written (answers to your questions and implementation ideas) in 24h

itraviv commented 4 years ago

timestamps - if normalize=True use an empty string.
source path - use only the filename (use basename on the full path). thought about calculating some relative path (to some root point for example) but I don't know how to do it without extra knowledge and I don't want the user to give extra input and I found it useful enough this way.
repr - I think it'll be rude to override the user-defined __repr__ (if he explicitly wanted some kind of __repr__ I wouldn't override it). So basically now I am only overriding the default __repr__ with post-processing and cutting the memory address part (where at 0x... appears).
thread data - using normalize=True will force thread_info=False
max variable length - I think we should force some value to always get the same output.

Everything will be documented as detailed as possible (I'll add detailed info to the docs).

Code snippets for discussion:

1.

now_string = pycompat.time_isoformat(now, timespec='microseconds') if not self.normalize else ''

2.

source_path = source_path if not self.normalize else os.path.basename(source_path)

3.

value_repr = utils.normalize_repr(value_repr)

Where normalize_repr is: 4.

def normalize_repr(item_repr):
    parts = item_repr.partition(' at')
    if parts[1]:
        return parts[0] + '>'
    return parts[0]

This is a naive implementation for truncating the default __repr__ at the 'at' word. i will gladly improve it.

In general - I tried to use the most efficient logic I could think of since I don't want to increase the runtime.

cool-RR commented 4 years ago

timestamps - if normalize=True use an empty string.

Agreed.

source path - use only the filename (use basename on the full path). thought about calculating some relative path (to some root point for example) but I don't know how to do it without extra knowledge and I don't want the user to give extra input and I found it useful enough this way.

Agreed.

repr - I think it'll be rude to override the user-defined __repr__ (if he explicitly wanted some kind of __repr__ I wouldn't override it). So basically now I am only overriding the default __repr__ with post-processing and cutting the memory address part (where at 0x... appears).

Agreed.

thread data - using normalize=True will force thread_info=False

Hmm, instead, when normalize=True and thread_info=True raise a NotImplementedError that explains that using these flags together isn't implemented.

max variable length - I think we should force some value to always get the same output.

I disagree, people who are savvy enough to specify a different max_variable_length should know that they need to set the same number on both runs.

Everything will be documented as detailed as possible (I'll add detailed info to the docs).

Code snippets for discussion:
now_string = pycompat.time_isoformat(now, timespec='microseconds') if not self.normalize else ''
source_path = source_path if not self.normalize else os.path.basename(source_path)
value_repr = utils.normalize_repr(value_repr)
Where normalize_repr is: 4.
def normalize_repr(item_repr):
    parts = item_repr.partition(' at')
    if parts[1]:
        return parts[0] + '>'
    return parts[0]
This is a naive implementation for truncating the default __repr__ at the 'at' word. i will gladly improve it.

In general - I tried to use the most efficient logic I could think of since I don't want to increase the runtime.

Looks good, don't forget adding tests, I'll wait for your PR.

cool-RR commented 4 years ago

Merged in #171

cool-RR / PySnooper

Add a flag to normalize the output (remove machine-specific data) #168