HPCE / hpce-2017-cw5

1 stars 6 forks source link

What's going on with Random projection n=1553 ? #62

Open dc3315 opened 6 years ago

dc3315 commented 6 years ago

Hi,

We noticed that the graph for random projection execution times is very interesting. For some reason, when n=1553, the execution is orders of magnitude slower than when n=1550. i.e in our case, a going from less than a half a second to 6.5 seconds. It seems that this effects literally everyone except one group, the fastest one.

I have two questions: Why is this the case? and secondly, is it intentional to test that value? Many thanks.

m8pple commented 6 years ago

This behavior occurs pretty much every year (though in different puzzles, for different reasons), but I think you are the first person to ever actually ask why. When there were orals, I would often ask why every submission spikes by orders of magnitude at some values, and often people had never notice it or thought about it. So kudos for asking the question.

For the first question: it is not intentional to test that value. The only thing I make sure of is that I'm not testing "nice" values, so I try to avoid too many multiples of 32 or 64 in the scale factors. My assumption (based on experience) is that we will hit "interesting" values eventually.

As to why: I try to avoid looking at implementations during, so I'm not sure, but I'm guessing (I genuinely have no idea - everything from this point is pure speculation just from looking at the graph) that most implementations are GPU to the right-hand side of about 800 (for the blah results). GPUs have certain architectural properties that encourage certain batch-sizes (this is implicitly covered in one of the lectures, though you need to think through the implications). They also have a memory interface, which also means there is a certain preferred granularity there too. The choice of work-group size is going to determine how your iteration space gets mapped onto these hardware features. So if certain scale factors map badly to the underlying hardware, is there anything you can do to make the iteration space map better, while keeping the same scale?

(Note that this is all speculation, but is not deliberately intended to mislead - I may just be wrong in my guesses)