domainaware / parsedmarc

A Python package and CLI for parsing aggregate and forensic DMARC reports
https://domainaware.github.io/parsedmarc/
Apache License 2.0
962 stars 210 forks source link

Slow base domain check due to repeated instantiation of publicsuffixlist object #473

Open abgoldberg opened 4 months ago

abgoldberg commented 4 months ago

I have found that the performance can become quite slow when processing a large number of reports with DNS queries and reverse dns base domain computation.

It turns out this is due to this line:

https://github.com/domainaware/parsedmarc/blob/7d2b431e5f20bdcdb330c4fbb23ce7df5fb0642f/parsedmarc/utils.py#L95C5-L95C46

Instantiating the psl object in every call of the function leads to parsing the whole PSL and is quite slow. This would be better pulled out of the function and the same instance used for every get_base_domain call.

Kuzuto commented 2 months ago

I think the problem is more related to the DNS resolver on the PTR lookup, then get_base_domain call. get_base_domain is done local on a PSL table. The slow DNS PTR lookup is fixed in the new version, that introduce DNS cache.