HaveIBeenPwned / EmailAddressExtractor

A project to rapidly extract all email addresses from any files in a given path
BSD 3-Clause "New" or "Revised" License
67 stars 23 forks source link

Performance tweaks #18

Closed GStefanowich closed 1 year ago

GStefanowich commented 1 year ago

This changes the File reader from using the File.ReadByLineAsync to a FileStream and a StreamReader. Using the FileStream in SequentialScan mode should help with reading speeds since we know we're not backtracking in the file.

The read time of 69,949,440 lines was 191 seconds, though this is only a bunch of garbage emails listed sequentially in a file.

I also added a Lines count read to the Monitor that will keep track of total lines read, and moved the value incrementor inside of a if-added so that duplicate emails aren't counted more than once.