Closed weidezhang closed 9 years ago
Hi, @weidezhang. The code samples 1/i
of the data and creates a dataset where each node has the same 1/i
of data. This is to guarantee that data is distributed evenly across all i
nodes. It is important for measuring the throughput.
thanks alexander. it makes sense.
Hi,
I'm reading your ann-bench mark spark version. When you do the following, shouldn't the sampling need to be done for every node ? It seems u just did for once and every node share the same sample data.