HaveIBeenPwned / EmailAddressExtractor

A project to rapidly extract all email addresses from any files in a given path
BSD 3-Clause "New" or "Revised" License
68 stars 23 forks source link

Created a Reader interface #35

Closed GStefanowich closed 1 year ago

GStefanowich commented 1 year ago

Created some new classes and an interface for reading from different file types.

Since the FileExtensionParsing was implemented and we can differentiate or skip different file extensions, different readers can also be fetched for each type. The current implementation was moved into a PlainTextReader

I've created two (currently blank and unimplemented) samples for PDFs and Documents (.doc, .docx, etc). Reading them may require either an additional implementation, or use of a library. If somebody else wants to investigate they can, or I may when time permits

GStefanowich commented 1 year ago

Took a crack at implemented a proper file size display:

Found 10 files:
- .txt: 7 files : 88 Kb
- .none: 1 files : 0 Bytes, Skipping (Unknown Extension)
- .mp4: 2 files : 0 Bytes, Skipping (Audio/Video files)

Now gets the nearest usable size. My .none and .mp4 test files are just 0-byte files I named