Open glatard opened 5 years ago
task_dosomething(file1, file2)
The task that produced file1 was to be executed on Node 1 and the task that produced file two was executed on Node 2.
Since the dependencies of task_dosomething occur on both, we just pick one node randomly. Let's say Node 1.
At runtime, it turns out file1 is 10kb and file2 is 10G, so we must transfer 10G over the network, when with more information (file sizes), we would have known that scheduling task_dosomething on Node 2 would have been better
I guess if you suppose that files passed will always be of equal size, then the poor-person's implementation is fine.
Hi @ValHayot, two questions about data locality: