NLnetLabs / domain

A DNS library for Rust.
https://nlnetlabs.nl/projects/domain/about/
BSD 3-Clause "New" or "Revised" License
332 stars 56 forks source link

Improved handling of errors while sending TCP responses. #309

Open ximon18 opened 2 months ago

ximon18 commented 2 months ago

This PR contains a couple of fixes for TCP connection error handling:

It also adds a metric for the number of aborted writes that occur.

These issues were discovered when dig disconnected because a large AXFR transfer was slow to start and the default dig 5 second timeout expired.

Interestingly these issues were easier to fix in the WIP xfr branch that I found them on as that is based on the service-layering branch which simplifies the response handling.

partim commented 2 months ago

I’m not sure if considering non-fatal errors with TCP is necessary. Both write_all and read_exact will already cover ErrorKind::Interrupted. ErrorKind::TimedOut should probably treated as fatal – it means the socket has tried re-sending the data a couple times and then has waited.

Retrying when the configured response_write_timeout feels a bit odd, given that this means the actual timeout is response_write_timeout * (response_write_retries + 1), which is a bit unexpected.

Looking up ErrorKind::TimedOut, I now even wonder if having your own write timeout makes sense or whether it is better to just rely on the TCP stack and its configuration.