Closed adrian-carro closed 6 years ago
@adrian-carro Looks like the random number generator instance rand
is declared as a public variable which means it is shared with all of the code.
Probably makes the most sense to create a single instance of the random number generator and then pass that instance into the constructor of any object that needs a source of randomness.
Also watch out for any operations where you are iterating over HashSet
or HashMap
data structures as the order in which the iteration is done will not be consistent between model runs.
Is there a test case? What is the best way to generate an output file that we can quickly check is the same after two model runs.
With PR #52, any problem with random number generation would only affect branch ML-calibration. I'm thus reducing priority to low, for now, though I'll check this issue towards the end of this week.
@davidrpugh I am re-stating high priority for this issue as I've just had time to check the random numbers in the main branch and I get again inconsistencies between different runs. For example, if one prints a prng.nextInt()
after 5 times steps, different runs will lead to a different number being printed.
Just so that I can reproduce this issue myself. Are you saying the if you insert a System.out.print(prng.nextInt())
in the main
method of the Model.java
class, then after t=5 time steps, different runs will produce different outputs?
Which output file should I look at the see the differences?
So, if you place a System.out.print(prng.nextInt()) within the main method in the Model class and you take note, for example, of the 5th number printed, when you compare it between different runs, you'll see different numbers. That proves something is going wrong with the random number generation. Apart from this, of course, you can compare the output files, which are also different after the first several time steps and in several columns.
Ok. The only way that can happen is if the generator is getting called a different number of times depending on model run. I have a few ideas about why this is happening but they will be difficult to diagnose give the currrentbstate if the code.
Can you please point me to the single most important example of a model output file that is different across model runs using the same seed.
Ok, let's take the Output-run1.csv
file, row 6, corresponding to Model time = 4
(first column). For two different runs I find different values at the second column, nNonBTLHomeless
. This is just an easy to find example, representative of what is happening. I am guessing any example is fine, as we just need to detect where the problem is stemming from.
Is the insertion of the print statement necessary to generate the bug? I spot checked the Output-1.csv file when I made my previous changes and the output appeared identical.
Is there any multi-threading in the model?
There is no multi-threading, and no, the print statement is not needed, it was just to get information faster, without any need to compare output files. Could you confirm is the Output-1.csv files are identical for you? That would be fairly surprising... why would there be system dependencies?
I will try to replicate the issue when I get home. I am running on Windows using some version of oraclejdk 8.
Great, thanks for doing this! I'm on Ubuntu 16.04, and running openjdk version "1.8.0_162".
@adrian-carro I just ran the main
method of the Model.java
class for roughly 500 time steps. This generates a directory of output in the Results
folder. I then, re-run the main
method again (without changing anything) to generate a second directory of output in the Results
folder. I then compare the resulting Output-run1.csv
files. They appear to be identical. For example here is the record for t=103
from both files:
First simulation I ran generated...
103, 283, 3, 286, 331, 617, 646, 31, 677, 255, 289, 2, 0, 1549, 1270, 12, 0, 7, 0.26614173228346455, 1.0244697019962505, -0.002299730036721126, 320082.397005936, 250030.7937810174, 252088.47594443636, 0.0, 21.326370650946313, 95, 46, 19, 19, 0, 19, 0.15789473684210525, 0.631578947368421, 1.01157359475851, -0.024098208995406822, 294.161880977776, 876.8550416063504, 624.6606537036129, 5.217391304347826, 270, 30, 23, 0.04411977882667759, 933, 185688.6040452648
Second simulation I ran generated...
103, 283, 3, 286, 331, 617, 646, 31, 677, 255, 289, 2, 0, 1549, 1270, 12, 0, 7, 0.26614173228346455, 1.0244697019962505, -0.002299730036721126, 320082.397005936, 250030.7937810174, 252088.47594443636, 0.0, 21.326370650946313, 95, 46, 19, 19, 0, 19, 0.15789473684210525, 0.631578947368421, 1.01157359475851, -0.024098208995406822, 294.161880977776, 876.8550416063504, 624.6606537036129, 5.217391304347826, 270, 30, 23, 0.04411977882667759, 933, 185688.6040452648
Is this the right experimental setup to replicate your results? Perhaps you are running the simulations from the command line rather than from within an IDE?
@davidrpugh I am using the IDE (intelliJ IDEA), but I've also tried command line with the same results: random numbers are not consistent between runs. I'm fairly surprised by the fact you do not obtain that inconsistency! I know Donovan (one of our PhD students) is having the same problem, as he flagged it yesterday when he started working with the model, and he uses a Mac. Do you have any clue of what might be happening here?
I am also using IntelliJ. I am running the model of my work computer which runs on Windows.
I want to be exactly clear about what I am doing just to be certain that we are doing the same thing.
Is this what you are doing? Please try to be very specific in your description.
Ouch, ok, I can confirm the problem does not appear when I follow your instructions. I have only seen the problem when:
mvn exec:java -Dexec.mainClass="housing.Model"
.clean validate compile exec:java
.
For the rest (your points 2 and 3), I proceed exactly as you do.@adrian-carro This is useful information! It suggests that the issue arises at compilation time, rather than runtime. Running multiple simulations using the steps that I provided compiles the model once. Doing it your way forces compilation of the code twice.
Do you know if the configuration files are being picked up when the model is compiled or after the model is compiled?
@davidrpugh The config file should be picked up only at execution, if I am not wrong. However, as I understand it, my option number 1, running the code from command line with mvn exec:java -Dexec.mainClass="housing.Model"
is not compiling the code, but just executing it. At least I had assumed so, please, correct me if I wrong! ;-)
@davidrpugh I finally managed to find the origin of this problem: there was a small bug at the PriorityQueue2D which was leading to some transactions happening or not happening, as a consequence of which the amount of random numbers used would change, thus leading to the observed inconsistencies. I obtain now consistent random numbers between runs. Also, I have checked and I obtain now the same stream of random numbers whether directly running the class or using the maven execution goal. I still don't understand why these two ways of running the code were leading to different results before, but I guess they might simply have different ways of handling errors!
@adrian-carro Great! Glad that you found the issue. Sounds super tedious. I don't actually think that the 2D priority queue is necessary at all to implement the matching algorithm. We might want to think about removing it and replacing it with something simpler.
Random numbers change from run to run with the same seed!