Address throughput issues caused by overloading AWS server with DNS queries. The team has already investigated and identified that making the DNS queries UDP would help.
Motivation and context
Issues with throughput occur during BOD 18-01 scans given AWS limitations during DNS queries. AWS recommends having all outgoing traffic go through one interface. However, we’re currently hard on the DNS server for some of our scan types.
There have also been reports of SPF record issues likely also related to DNS queries. The team stated is most likely caused by an issue where the trustymail lambdas are sending too many packets to the link-local address range; in short, they are making too many DNS requests per second.
We asked AWS about this before and they suggested using UDP. We could redesign the BOD 18-01 VPC and distribute DNS load if that proves necessary.
Implementation notes
Investigation has occurred but increasing parallelism had poor results. This could have been from still hitting the AWS DNS server, if it couldn’t look up domains for it. Next steps include changing DNS queries to UDP and monitoring for improved throughput.
Summary
Address throughput issues caused by overloading AWS server with DNS queries. The team has already investigated and identified that making the DNS queries UDP would help.
Motivation and context
Issues with throughput occur during BOD 18-01 scans given AWS limitations during DNS queries. AWS recommends having all outgoing traffic go through one interface. However, we’re currently hard on the DNS server for some of our scan types.
There have also been reports of SPF record issues likely also related to DNS queries. The team stated is most likely caused by an issue where the trustymail lambdas are sending too many packets to the link-local address range; in short, they are making too many DNS requests per second.
We asked AWS about this before and they suggested using UDP. We could redesign the BOD 18-01 VPC and distribute DNS load if that proves necessary.
Implementation notes
Investigation has occurred but increasing parallelism had poor results. This could have been from still hitting the AWS DNS server, if it couldn’t look up domains for it. Next steps include changing DNS queries to UDP and monitoring for improved throughput.