A link-time check.
Without this invariant, all_host_to_device that's part of sync_run is very broken: it only iterates embedded nodes.
With the invariant it's still broken. For constant nodes, #234 fixes it; but for non-constant nodes, we also need to rethink all_host_to_devie and all_device_to_host wrt. non-embedded nodes.
Note that the invariant is related to a note I left somewhere in the docs that if two contexts on a single virtual device share a node but the node is not part of their first common ancestor, the situation is undefined.
Since embedded vs. non-embedded subtensors is a tensor-level abstraction, it's hard to do better than offer some more Train-level glue code and have current and future Train code check for the invariant.
A link-time check. Without this invariant,
all_host_to_device
that's part ofsync_run
is very broken: it only iterates embedded nodes. With the invariant it's still broken. For constant nodes, #234 fixes it; but for non-constant nodes, we also need to rethinkall_host_to_devie
andall_device_to_host
wrt. non-embedded nodes.Note that the invariant is related to a note I left somewhere in the docs that if two contexts on a single virtual device share a node but the node is not part of their first common ancestor, the situation is undefined.