HaveIBeenPwned / EmailAddressExtractor

A project to rapidly extract all email addresses from any files in a given path
BSD 3-Clause "New" or "Revised" License
64 stars 23 forks source link

Implements a performance tracker #43

Closed GStefanowich closed 1 year ago

GStefanowich commented 1 year ago

A start of a fix for #42. Seemed like a fun challenge to take a stab at

Added a --debug CLI flag to enable debug mode, it'll pass around an IPerformanceStack (which is meant to be similar to a Stack (eg; a Stacktrace)).

The interface mainly adds two methods

// Create a new nested stack
IPerformanceStack CreateStack(string name);

// Add a new measured step in the current stack
void Step(string name);

When --debug is not enabled, it'll pass around an empty object which just calls void methods, so no accidental overhead performing normally:

static readonly IPerformanceStack DEFAULT = new DefaultPerformanceStack();

private sealed class DefaultPerformanceStack : IPerformanceStack {
    /// <inheritdoc />
    public IPerformanceStack CreateStack(string name)
        => this;

    /// <inheritdoc />
    public void Step(string name) {}

    public void Log() {}

    void IDisposable.Dispose() {}
}

The new logger output with debug on is:

Extraction time: 2,052ms
Addresses extracted: 16
Read lines total: 285,130
Read lines rate: 138,952/s

 - Read file x6 | Took 2,049ms (at ~342ms per)
   - Read line x356,412 | Took 2,018ms (at ~6μs per)
   - Run regex x356,411 | Took 1,117ms (at ~3μs per)
     - Check length    x308,890 | Took 337ms (at ~1μs per)
     - Capture string  x308,890 | Took 73ms (at ~0μs per)
     - Filter invalids x308,890 | Took 127ms (at ~0μs per)
     - Validate domain x285,130 | Took 90ms (at ~0μs per)

Read file and Run regex have nested stacks under them (Just some nice formatting). This way the amount of time spent on each action can be seen, and the average time for each iteration (default: microseconds).


Also fixed an issue where --help was actually ---help and --version was actually ---version

GStefanowich commented 1 year ago

@troyhunt Merge conflict resolved if you want to merge

You can do some indepth testing on what parts of the code are slowest since you can throw some data leaks at it