BachiLi / diffvg

Differentiable Vector Graphics Rasterization
https://people.csail.mit.edu/tzumao/diffvg/
Apache License 2.0
923 stars 155 forks source link

non-deterministic behaviour of the renderer in "painterly_rendering.py" #30

Open yael-vinker opened 2 years ago

yael-vinker commented 2 years ago

Hi, thanks a lot for providing this public implementation.

I am trying to achieve a deterministic training for reproducibility. In lines 40-41 in "painterly_rendering.py", you defined:

random.seed(1234)
torch.manual_seed(1234)

However, when running exactly the same command twice, using the same machine and environment, I get different loss values in the first iterations: command used: python painterly_rendering.py imgs/baboon.png --num_iter 3

output, 1:

Scene construction, time: 0.01136 s
Forward pass, time: 0.03906 s
iteration: 0
Scene construction, time: 0.00223 s
Forward pass, time: 0.00845 s
render loss: 0.2781107723712921
Backward pass, time: 0.05038 s
iteration: 1
Scene construction, time: 0.00215 s
Forward pass, time: 0.00980 s
render loss: 0.272503137588501
Backward pass, time: 0.05281 s
iteration: 2
Scene construction, time: 0.00186 s
Forward pass, time: 0.00857 s
render loss: 0.266690731048584
Backward pass, time: 0.07965 s
Scene construction, time: 0.00172 s
Forward pass, time: 0.00384 s

output, 2:

Scene construction, time: 0.02374 s
Forward pass, time: 0.01743 s
iteration: 0
Scene construction, time: 0.00133 s
Forward pass, time: 0.01063 s
render loss: 0.2781107723712921
Backward pass, time: 0.04362 s
iteration: 1
Scene construction, time: 0.00159 s
Forward pass, time: 0.00777 s
render loss: 0.2725030183792114
Backward pass, time: 0.07198 s
iteration: 2
Scene construction, time: 0.00176 s
Forward pass, time: 0.00802 s
render loss: 0.2666889429092407
Backward pass, time: 0.07049 s
Scene construction, time: 0.00244 s
Forward pass, time: 0.00385 s

you can see that in the second iteration the losses are different: 0.272503137588501 and 0.2725030183792114 Is there a way to fix that in order to achieve consistent results during training?

Thanks

TheDevilWillBeBee commented 2 years ago

The problem arises because of data race between threads. Because of the limited precision, addition of float numbers is not associative, i.e. (a + b) + c != a + (b + c) Changing the number of threads in the code to one is a temporary fix for this problem.