I note that in the abstract of the paper it mentions it supports distributed-memory execution on cloud/supercomputer systems and is available as open source, however not a single actual example exists in the repo. Since the cost of hessian is significant, is it pratical to use DDP to reduce GPU memory load? Or any examples? Thank you.
I note that in the abstract of the paper it mentions it supports distributed-memory execution on cloud/supercomputer systems and is available as open source, however not a single actual example exists in the repo. Since the cost of hessian is significant, is it pratical to use DDP to reduce GPU memory load? Or any examples? Thank you.