Open abyrd opened 3 years ago
Note that some of the slowness in these tests is due to building histograms at every destination, even though the test only looks at one of them. Removing that behavior would probably speed them up significantly, making it more reasonable to spend more time on more MC draws.
Recent tests, including with the frequency-heavy network of Sao Paulo, prompted me to think about this again. Letting users toggle on deterministic seeding would be straightforward to implement (e.g. using a similar approach to the one in the multi-criteria router, at https://github.com/conveyal/r5/blob/v6.9/src/main/java/com/conveyal/r5/profile/McRaptorSuboptimalPathProfileRouter.java#L120-L123) and would help users resolve a common headache when they are doing scenario comparisons. We could still recommend networks with frequency-based routes be analyzed with fully randomized schedules first, to get a sense of the noise/uncertainty.
We discussed this again recently. Results are expected to converge on stable values with an adequate number of MC draws. In the context of these stochastic methods, there does not seem to be a legitimate use for fixed seeds. Any use would amount to an illusion of artificial precision and could lead to inadvertent cherry-picking of results.
If expectations for stable results are not met, there are two main explanations:
For tests, the solution is probably to increase the number of MC draws until they pass reliably. Test results would remain nondeterministic by nature, but the probability of failure can be lowered until it essentially never happens.
For regular use we should provide guidance on exact-times, phasing, and number of MC draws, and explain clearly in documentation how and why these stabilize results.
If we want to allow increasing the number of MC draws, we should also test and adjust the socket timeout settings (referenced at https://github.com/conveyal/r5/blob/v6.9/src/main/java/com/conveyal/analysis/controllers/BrokerController.java)
Some of the Simpson Desert tests occasionally fail in GH actions test runs. This is probably because they test the closeness of our Monte Carlo results to theoretical results, and there's always some probability that the MC results will be way off. For reproducible testing we could seed all our random number generators but arguably that reduces thoroughness, and in any case small changes to routing could still change the order in which numbers are produced and cause the tests to fail again. Maybe we should just use really high numbers of MC draws on these tests.