Open crutcher opened 2 years ago
At this point in apply_to_tensors(), the PackedSequence case drops the result tensors, unlike the other cases https://github.com/facebookresearch/fairscale/blob/main/fairscale/utils/containers.py#L27
apply_to_tensors()
and thus fully_sharded_data_parallel is going to fail to capture the tensors for hooks here: https://github.com/facebookresearch/fairscale/blob/main/fairscale/nn/data_parallel/fully_sharded_data_parallel.py#L1545
or properly yield casting results here: https://github.com/facebookresearch/fairscale/blob/main/fairscale/nn/data_parallel/fully_sharded_data_parallel.py#L2490
This is a care that our unit tests failed to cover, right?
At this point in
apply_to_tensors()
, the PackedSequence case drops the result tensors, unlike the other cases https://github.com/facebookresearch/fairscale/blob/main/fairscale/utils/containers.py#L27and thus fully_sharded_data_parallel is going to fail to capture the tensors for hooks here: https://github.com/facebookresearch/fairscale/blob/main/fairscale/nn/data_parallel/fully_sharded_data_parallel.py#L1545
or properly yield casting results here: https://github.com/facebookresearch/fairscale/blob/main/fairscale/nn/data_parallel/fully_sharded_data_parallel.py#L2490