csarofeen / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
26 stars 7 forks source link

set heuristics to persistent after welford translatte #2491

Closed liqiangxl closed 1 year ago

liqiangxl commented 1 year ago

Fixes #2486 reduction heuristic was used while the correct one should be persistent.

naoyam commented 1 year ago

Actually, do we want to do the welford translation if the initial scheduler is not the persistent scheduler? If it's originally just the reduction scheduler, making it to persistent may result in lower performance. Can you do some benchmarking?

liqiangxl commented 1 year ago

Actually, do we want to do the welford translation if the initial scheduler is not the persistent scheduler? If it's originally just the reduction scheduler, making it to persistent may result in lower performance. Can you do some benchmarking?

I tested this var_mean and there is no performance change whether we translate or not. In either case, we only need to read the data from gmem once.

naoyam commented 1 year ago

Actually, do we want to do the welford translation if the initial scheduler is not the persistent scheduler? If it's originally just the reduction scheduler, making it to persistent may result in lower performance. Can you do some benchmarking?

I tested this var_mean and there is no performance change whether we translate or not. In either case, we only need to read the data from gmem once.

Thanks for the testing. Then let's not do the translation. It's just an additional lowering overhead.