Closed b-remy closed 3 years ago
This looks good :-) here is my question, what happens if you have something like
batch_dim = mtf.Dimension("batch", 2)
nx_dim = mtf.Dimension('nx', 8)
a = mtf.random_uniform(shape=[nx_dim])
mesh_shape = [ ("row", 2)]
layout_rules = [('batch', 'row')]
the nx_dim is not distributed, so we expect that each process has the same thing.
yeah it does not output the same tensor on both processes:
Final result Final result [0.10086262 0.9701668 0.8487642 0.04828131 0.04852307 0.77747464
0.844468 0.41707492]
[0.2390374 0.92039955 0.05051243 0.49574447 0.8355223 0.02647042
0.08811307 0.4566604 ]
The question I ask myself is how can get the same seed along the axis that is not distributed and a different ones for the distributed axes.... ?
In order to solve the issue #3 , I adapted the
random
function inmesh_tensorflow/hvd_simd_mesh_impl.py
.Since GPU have seeds enabled (while TPU have not, see comment in
mesh_tensorflow/simd_mesh_impl.py
), I ensured that when a seed is specified, it is split along the mesh slices to not have the same tensor everywhere.Examples:
mtf.random_uniform
returns:Final result [[23.90374]] Final result [[10.086262]]
seeds are necessary to make sure that slices that should have the
same values actually do have the same values.