facebookresearch / fairscale

PyTorch extensions for high performance and large scale training.
Other
3.2k stars 281 forks source link

containers:apply_to_tensors fails to return (or test) the application result on PackedSequence #996

Open crutcher opened 2 years ago

crutcher commented 2 years ago

At this point in apply_to_tensors(), the PackedSequence case drops the result tensors, unlike the other cases https://github.com/facebookresearch/fairscale/blob/main/fairscale/utils/containers.py#L27

and thus fully_sharded_data_parallel is going to fail to capture the tensors for hooks here: https://github.com/facebookresearch/fairscale/blob/main/fairscale/nn/data_parallel/fully_sharded_data_parallel.py#L1545

or properly yield casting results here: https://github.com/facebookresearch/fairscale/blob/main/fairscale/nn/data_parallel/fully_sharded_data_parallel.py#L2490

min-xu-ai commented 2 years ago

This is a care that our unit tests failed to cover, right?