FGRibreau / mailchecker

:mailbox: Cross-language temporary (disposable/throwaway) email detection library. Covers 55 734+ fake email providers.
https://twitter.com/FGRibreau
MIT License
1.64k stars 258 forks source link

Convert node blacklist to a Set for performance gains #433

Closed husainalzeera closed 6 months ago

husainalzeera commented 1 year ago

The node code is currently going through the blacklist array through an indexOf call. By converting it to a Set and using has there is a significant performance gain.

VasilNikolov commented 12 months ago

We can think of implementing an async version of the functions too, possible performance benefits since it won't block the thread.

lyrixx commented 8 months ago

I understand it's much faster when looking int the set,but initialization is slower, isn't?

Can you share some bench?

thanks

alcore commented 6 months ago

@lyrixx

Can you share some bench?

It's a O(n) change to O(1), with an N of 55k+, where the initialization cost is paid only once, as opposed to the runtime cost. You can expect a difference of at minimum several orders of magnitude (I'd assume a 50-100x speedup). Initialization is O(n) in both cases, the difference will be neglible in comparison.

Do you really need a benchmark for something this obvious?

For the record, the Rust version (possibly others too) suffers from the exact same issue and would massively benefit from a PHF Set/Map instead. With that change, and getting rid of all the entirely unnecessary allocations, I got a 2000x speedup (111.408µs vs 52ns) on a test address composed of 4 test segments ("test.test.test.io"), where the difference becomes even more pronounced the more segments you have.

@VasilNikolov

We can think of implementing an async version of the functions too, possible performance benefits since it won't block the thread.

There is literally nothing "blocking the thread" other than work that must happen anyway. Slapping "async" on code that is purely synchronous will in fact give you not a single benefit in any respect, because no "performance benefits" are "possible" in a single-threaded context that needs to perform those operations anyway.

If anything, async here will decrease performance and add complexity on top.

This is a common bad practice in ecosystems with async/await - please don't propagate it.

VasilNikolov commented 6 months ago

@alcore Yes, async is pointless here, dk what I was thinking.