Closed dabreegster closed 1 year ago
To reproduce, the following returns different output:
cargo run --release -- config/England/somerset.txt --rng-seed 0
cp data/output/England/2020/somerset.pb test1.pb
cargo run --release -- config/England/somerset.txt --rng-seed 0
cp data/output/England/2020/somerset.pb test2.pb
diff test1.pb test2.pb
It looks like there are two issues for the non-det in outputs:
commuting.rs
: the Businesses
struct uses a HashMap
that is looped over here with rng
calls in each loop. Converting to a BTreeMap
33cbaef resolves this non-det when the protobuf outputs are converted to JSON with sorted keys (initial test script using jq to sort the JSON keys).Awesome!
For 1, is there any noticeable perf difference? Commuting dominates the runtime. Even if there's a slight slowdown, I'd vote it's worth it for determinism.
It's quite unfortunate about 2... at worst we can switch to an ordered list of tuples, but that's a less nice API for consumers
For 1, is there any noticeable perf difference? Commuting dominates the runtime. Even if there's a slight slowdown, I'd vote it's worth it for determinism.
For Greater London, I get:
main
: creating population (311.88s) with create_commuting_flows (274.32s)52-fix-determinism
: creating population (308.74s) with create_commuting_flows (272.61s)So it seems unaffected but I'll run some others too!
London is a good stress test and those numbers are great, so all good on btreemap!
On Wed, Apr 26, 2023, 5:33 PM Sam Greenbury @.***> wrote:
For 1, is there any noticeable perf difference https://alan-turing-institute.github.io/uatk-spc/performance.html? Commuting dominates the runtime. Even if there's a slight slowdown, I'd vote it's worth it for determinism.
For Greater London, I get:
- main: creating population (311.88s) with create_commuting_flows (274.32s)
- 52-fix-determinism: creating population (308.74s) with create_commuting_flows (272.61s)
So it seems unaffected but I'll run some others too!
— Reply to this email directly, view it on GitHub https://github.com/alan-turing-institute/uatk-spc/issues/52#issuecomment-1523723144, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWLF6AFZB6KTPLEUVH6U3XDFE6PANCNFSM6AAAAAAXK45UJQ . You are receiving this because you were mentioned.Message ID: @.***>
It's quite unfortunate about 2... at worst we can switch to an ordered list of tuples, but that's a less nice API for consumers
Good news! prost allows config of the build to write the protobuf with BTreeMap
over HashMap
(61491d50b066c2dd59790796b2f02cdb4ea14ea3) and this now produces fully deterministic protobuf output.
To complete this I can have a look at including an integration test to do this determinism check that we can optionally run with e.g. cargo test -- --include-ignored
.
@sgreenbury found that running the pipeline twice on the same input produced a different output file. That's a bug! We should look into it. Easiest way may be to convert the .pb to text and do a diff there. If I had to guess, maybe something in the commuting module is non-det.