This does change the framework from .net6 to .net7 so not sure if you want to do that.
.net7 introduced the GeneratedRegex Attribute, which is supposed to help Regex performance. It pregenerates the Regex nodes instead of calling the Regex.Matches on every loop (Honestly not sure if the Regex library does any static caching).
I also introduced the RegexOptions.IgnoreCase to remove the upper casing from [a-zA-Z0-9] -> [a-z0-9] matches. The IgnoreCase enum adds some performance improvements in .net7 with matching, as it now uses char is "A" or "a" matching instead of the overhead of char.ToLower() == "a". Though I also replaced the a-z matches with \p{L} to catch special characters that aren't just alphabetic characters. If an email or a domain has umlauts or accented characters it'll match them.
Lastly I changed the File read to be asynchronous and read line-by-line instead of reading the whole file all at once. Since your regular expression wasn't doing any multiline matchings, there's no point in reading the entire file before beginning to search. This will save some on RAM if you're reading large files all at once.
This does change the framework from .net6 to .net7 so not sure if you want to do that.
.net7 introduced the GeneratedRegex Attribute, which is supposed to help Regex performance. It pregenerates the Regex nodes instead of calling the
Regex.Matches
on every loop (Honestly not sure if the Regex library does any static caching).I also introduced the
RegexOptions.IgnoreCase
to remove the upper casing from[a-zA-Z0-9]
->[a-z0-9]
matches. TheIgnoreCase
enum adds some performance improvements in .net7 with matching, as it now useschar is "A" or "a"
matching instead of the overhead ofchar.ToLower() == "a"
. Though I also replaced thea-z
matches with\p{L}
to catch special characters that aren't just alphabetic characters. If an email or a domain has umlauts or accented characters it'll match them.Lastly I changed the File read to be asynchronous and read line-by-line instead of reading the whole file all at once. Since your regular expression wasn't doing any multiline matchings, there's no point in reading the entire file before beginning to search. This will save some on RAM if you're reading large files all at once.