NLnetLabs / domain

A DNS library for Rust.
https://nlnetlabs.nl/projects/domain/about/
BSD 3-Clause "New" or "Revised" License
341 stars 57 forks source link

Replace Clock with tokio::time. #278

Closed ximon18 closed 6 months ago

ximon18 commented 6 months ago

This PR is a proof-of-concept showing use of #[tokio::test(start_paused = true)] and tokio::time::advance(Duration) instead of using a custom Clock type for this purpose..

All tests pass locally. Removing the call to advance() or not using start_paused = true in a test that uses the TIME_PASSES ELAPSE "Deckard" feature breaks the tests as expected.

Note: Not all tests strictly require start_paused = true, only those that execute a "Deckard" .rpl script that contains TIME_PASSES ELAPSE, but it doesn't break the others and so for ease of making this PR all "Deckard" tests are run with start_paused = true.

ximon18 commented 6 months ago

From the point of view of a validator, how do I get access to fake unix time (SystemTime::now), if we use the tokio mechanism?

This PR doesn't attempt to address the problem of manipulating SystemTime, it only addresses the needs of PR #275.

PR #274 will also need to manipulate SystemTime (because DNS cookies "Stelline" test cases pass in an absolute seconds since the epoch value which is only correct when compared to a mocked system clock). To address the cases that need to manipulate SystemTime I proposed to use a combination of:

Philip-NLnetLabs commented 6 months ago
* Still using the Tokio advance() approach for the simpler cases that don't need `SystemTime` manipulation.

I think we should not do that. This will create a lot of confusion for any test that combines a cache with the validator (or other code that needs absolute time). We should not have two separate mechanisms for faking time.

I think it is also worrying if tests have to run on a single thread.

So at the moment I think the best way forward is to move Stelline in the library. Then we can get rid of time argument in the public interface to the cache code and we can extend my current fake time code with fake unix time. We will have something that supports multi threaded testing and have integrated Instance with SystemTime.

ximon18 commented 6 months ago
* Still using the Tokio advance() approach for the simpler cases that don't need `SystemTime` manipulation.

I think we should not do that. This will create a lot of confusion for any test that combines a cache with the validator (or other code that needs absolute time). We should not have two separate mechanisms for faking time.

I think it is also worrying if tests have to run on a single thread.

So at the moment I think the best way forward is to move Stelline in the library. Then we can get rid of time argument in the public interface to the cache code and we can extend my current fake time code with fake unix time. We will have something that supports multi threaded testing and have integrated Instance with SystemTime.

I concur that multi-threaded testing is worth considering. The Tokio advance() approach only works for "current thread" Tokio runtimes (which until now are all that our tests have used). Note that this is NOT the same as a single threaded runtime, a "current thread" runtime may still use multiple Tokio threads, e.g. to handle I/O events in a separate thread, but the application tasks are handled only on the "current thread".

In my experience most if not all normal testing can be done on a "current thread" runtime. Using multi-threaded runtimes were only needed for certain application code designs and for tests such as load testing, and mocking time was only needed in the non-load testing type scenarios.

So, as long as we only require current thread runtimes for tests we don't need more than Tokio advance.

What I want to avoid is the need to have to pass a mock SystemTime via a public API, as you say moving the tests into the library and using #[cfg(test)] should avoid the need for that. But I also want to avoid passing a mock SystemTime down a deep call tree, even via a private interface. We need something that will work for the Serial case. If Serial is the only problem at present then changing Serial might be an option, however if Serial is not/will not be an isolated case then we will still have the problem.

In production code it can be desirable to keep dependencies to a minimum, but in test code I'd rather avoid reinventing wheels and hence the mock_instant crate reference, I'd rather use something already out there than invent our own if we can avoid it, and mock_instant also already caters for the multi-threaded case.

Philip-NLnetLabs commented 6 months ago

In production code it can be desirable to keep dependencies to a minimum, but in test code I'd rather avoid reinventing wheels and hence the mock_instant crate reference, I'd rather use something already out there than invent our own if we can avoid it, and mock_instant also already caters for the multi-threaded case.

If mock_instant does everything we need, then let's just switch to mock_instant. If it doesn't, then it is better to roll our own then to try to use mock_instant for some parts and other things in other parts.

How does mock_instant guarantee that all test code runs on a single thread? Or is that an accident waiting to happen?

ximon18 commented 6 months ago

In production code it can be desirable to keep dependencies to a minimum, but in test code I'd rather avoid reinventing wheels and hence the mock_instant crate reference, I'd rather use something already out there than invent our own if we can avoid it, and mock_instant also already caters for the multi-threaded case.

If mock_instant does everything we need, then let's just switch to mock_instant. If it doesn't, then it is better to roll our own then to try to use mock_instant for some parts and other things in other parts.

How does mock_instant guarantee that all test code runs on a single thread? Or is that an accident waiting to happen?

That's a good point, perhaps it doesn't. I would need to check. To be clear: I haven't used mock_instant before, it was just the first match I found on crates.io when seeing what prior art exists. It might not be the right prior art to use, perhaps there is not existing good match and then rolling our own is fine, though I would find it hard to believe that everyone else that had the same needs as we do rolled their own... but it could be the case.

The nice thing about Tokio advance() is that it works within a single Tokio Runtime, it is independent of which thread the code is running on.

Philip-NLnetLabs commented 6 months ago

The nice thing about Tokio advance() is that it works within a single Tokio Runtime, it is independent of which thread the code is running on.

It would have been nice if Tokio advance() would have been a complete solution. I don't think we are in a big hurry. We can move Stelline into the library and try mock_instant to see how it works.

Philip-NLnetLabs commented 6 months ago

This PR targets your Client Cache PR #275. It's not clear to me from your last comment how you think we should proceed? My preference would be to not include the Clock type in the public interface and thus not merge your PR #275 as-is - this PR was intended to provide a way forward to resolve that.

I appreciate that this PR does not address the SystemTime use case but it was never intended to as that is not needed by yoru Client Cache PR #275.

I still propose that we merge this PR into yours as, limited or not, the Tokio advance() mechanism provides everything currently needed by your PR without needing the adding the Clock type to the public interface, and then we merge your PR to main and investigate further either as a PR to main or as part of my "Add the net::server module" PR #274.

All of the cache code is behind unstable-client-transport, so I don't see why we can't release with some additional methods. Also, removing the clock code before we know if mock_instant works for us seems premature.

So I suggest we continue investigating if mock_instant works for us. Then decide on what we do.

In the mean time we can move Stelline into the library.

ximon18 commented 6 months ago

Closing this PR as we need to develop our ideas further on how best to tacke the technical issues pertaining to (mock) time based testing and this particular approack has insufficient support at this point.