Closed SPP3000 closed 10 months ago
I think this is related: https://github.com/huggingface/transformers/commit/f1732e1374a082bf8e43bd0e4aa8a2da21a32a21
@SPP3000 can you pls tell us how do you install TF4Rec and other merlin lib? Recommended way is to use merlin-pytorch:23.06
image, or if you are doing pip installation please be sure you comply with the transformers version in the requirements here, for exp, try 4.12.
I installed it in a python virtual environment without the use of any images.
pip install transformers4rec
and
pip install cudf-cu11 dask-cudf-cu11 --extra-index-url=https://pypi.nvidia.com
After executing the example code, I got some warnings about missing dependencies, such as tensorflow. I installed those as well over pip.
The problem I was facing, is that my installation gave me the newest version of the huggingface transformers, for which a the private function _pad_across_processes() has been removed from their Trainer
class.
You might want to account for this in your upcoming releases. For now I copy and pasted the missing code part into the Trainer class.
# Copied from Accelerate.
def _pad_across_processes(self, tensor, pad_index=-100):
"""
Recursively pad the tensors in a nested list/tuple/dictionary of tensors from all devices to the same size so
they can safely be gathered.
"""
if isinstance(tensor, (list, tuple)):
return type(tensor)(self._pad_across_processes(t, pad_index=pad_index) for t in tensor)
elif isinstance(tensor, dict):
return type(tensor)({k: self._pad_across_processes(v, pad_index=pad_index) for k, v in tensor.items()})
elif not isinstance(tensor, torch.Tensor):
raise TypeError(
f"Can't pad the values of type {type(tensor)}, only of nested list/tuple/dicts of tensors."
)
if len(tensor.shape) < 2:
return tensor
# Gather all sizes
size = torch.tensor(tensor.shape, device=tensor.device)[None]
sizes = self._nested_gather(size).cpu()
max_size = max(s[1] for s in sizes)
# When extracting XLA graphs for compilation, max_size is 0,
# so use inequality to avoid errors.
if tensor.shape[1] >= max_size:
return tensor
# Then pad to the maximum size
old_size = tensor.shape
new_size = list(old_size)
new_size[1] = max_size
new_tensor = tensor.new_zeros(tuple(new_size)) + pad_index
new_tensor[:, : old_size[1]] = tensor
return new_tensor
After executing the example code, I got some warnings about missing dependencies, such as tensorflow. I installed those as well over pip.
You do not need to install tensorflow for a pytorch workflow. Thanks for the code.
You do not need to install tensorflow for a pytorch workflow. Thanks for the code.
Ok good to know!
From my side everything is clear and if wanted we can close the issue anytime.
I installed it in a python virtual environment without the use of any images.
pip install transformers4rec
andpip install cudf-cu11 dask-cudf-cu11 --extra-index-url=https://pypi.nvidia.com
After executing the example code, I got some warnings about missing dependencies, such as tensorflow. I installed those as well over pip.
The problem I was facing, is that my installation gave me the newest version of the huggingface transformers, for which a the private function _pad_across_processes() has been removed from their
Trainer
class.You might want to account for this in your upcoming releases. For now I copy and pasted the missing code part into the Trainer class.
# Copied from Accelerate. def _pad_across_processes(self, tensor, pad_index=-100): """ Recursively pad the tensors in a nested list/tuple/dictionary of tensors from all devices to the same size so they can safely be gathered. """ if isinstance(tensor, (list, tuple)): return type(tensor)(self._pad_across_processes(t, pad_index=pad_index) for t in tensor) elif isinstance(tensor, dict): return type(tensor)({k: self._pad_across_processes(v, pad_index=pad_index) for k, v in tensor.items()}) elif not isinstance(tensor, torch.Tensor): raise TypeError( f"Can't pad the values of type {type(tensor)}, only of nested list/tuple/dicts of tensors." ) if len(tensor.shape) < 2: return tensor # Gather all sizes size = torch.tensor(tensor.shape, device=tensor.device)[None] sizes = self._nested_gather(size).cpu() max_size = max(s[1] for s in sizes) # When extracting XLA graphs for compilation, max_size is 0, # so use inequality to avoid errors. if tensor.shape[1] >= max_size: return tensor # Then pad to the maximum size old_size = tensor.shape new_size = list(old_size) new_size[1] = max_size new_tensor = tensor.new_zeros(tuple(new_size)) + pad_index new_tensor[:, : old_size[1]] = tensor return new_tensor
Hi, sorry I'm starting to use Transformers4Rec. My solution was install the follow based on your conclusion. I didn't find any solution in other sites, Thanks for sharing your knowledge.
!pip install --upgrade accelerate !pip install transformers==4.28.0
Thank you for providing an alternative solution that does not require manipulating code in the background. I can confirm that this solution works as well.
Bug description
Executing the example code of
02-End-to-end-session-based-with-Yoochoose-PyT.ipynb
leads to an error infit_and_evaluate(...)
as the trainer cannot find the method'_pad_across_processes'
. Is this a bug and if not where does this method come from and why is it not found? Training seems to work without problems, but as soon as the evaluation starts, this problem is raised.Steps/Code to reproduce bug
01-ETL-with-NVTabular.ipynb
,02-End-to-end-session-based-with-Yoochoose-PyT.ipynb
and the dataset01-ETL-with-NVTabular.ipynb
02-End-to-end-session-based-with-Yoochoose-PyT.ipynb
which leads to the reported behaviorExpected behavior
Output similar to the jupter notebook as seen on the git repository
Environment details
Additional context