Closed mrcmoresi closed 1 week ago
@mrcmoresi can you pls use 23.08
version of the repos? you can use our docker image which is recommended.. nvcr.io/nvidia/merlin/merlin-pytorch:23.08
you can then rerun the example. if you still have any issue you can share here.
Hi @rnyak thanks for your answer. I installed Transformers4Rec and NVtabular 23.08 and now I'm getting a different error when I'm trying to run the following cell
start_time_window_index = start_window_index
final_time_window_index = final_window_index
#Iterating over days of one week
for time_index in range(start_time_window_index, final_time_window_index):
# Set data
time_index_train = time_index
time_index_eval = time_index + 1
train_paths = glob.glob(os.path.join(OUTPUT_DIR, f"{time_index_train}/train.parquet"))
eval_paths = glob.glob(os.path.join(OUTPUT_DIR, f"{time_index_eval}/valid.parquet"))
print(train_paths)
# Train on day related to time_index
print('*'*20)
print("Launch training for day %s are:" %time_index)
print('*'*20 + '\n')
trainer.train_dataset_or_path = train_paths
trainer.reset_lr_scheduler()
trainer.train()
trainer.state.global_step +=1
print('finished')
# Evaluate on the following day
trainer.eval_dataset_or_path = eval_paths
train_metrics = trainer.evaluate(metric_key_prefix='eval')
print('*'*20)
print("Eval results for day %s are:\t" %time_index_eval)
print('\n' + '*'*20 + '\n')
for key in sorted(train_metrics.keys()):
print(" %s = %s" % (key, str(train_metrics[key])))
wipe_memory()
I'm getting the following error
--------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File /anaconda/envs/t4rec/lib/python3.10/site-packages/transformers4rec/torch/trainer.py:398, in Trainer._use_cuda_amp(self)
397 try:
--> 398 return self.use_cuda_amp
399 except AttributeError:
AttributeError: 'Trainer' object has no attribute 'use_cuda_amp'
During handling of the above exception, another exception occurred:
AttributeError Traceback (most recent call last)
Cell In[14], line 24
22 # Evaluate on the following day
23 trainer.eval_dataset_or_path = eval_paths
---> 24 train_metrics = trainer.evaluate(metric_key_prefix='eval')
25 print('*'*20)
26 print(\"Eval results for day %s are:\\t\" %time_index_eval)
File /anaconda/envs/t4rec/lib/python3.10/site-packages/transformers/trainer.py:3085, in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
3082 start_time = time.time()
3084 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3085 output = eval_loop(
3086 eval_dataloader,
3087 description=\"Evaluation\",
3088 # No point gathering the predictions if there are no metrics, otherwise we defer to
3089 # self.args.prediction_loss_only
3090 prediction_loss_only=True if self.compute_metrics is None else None,
3091 ignore_keys=ignore_keys,
3092 metric_key_prefix=metric_key_prefix,
3093 )
3095 total_batch_size = self.args.eval_batch_size * self.args.world_size
3096 if f\"{metric_key_prefix}_jit_compilation_time\" in output.metrics:
File /anaconda/envs/t4rec/lib/python3.10/site-packages/transformers4rec/torch/trainer.py:502, in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
495 if (
496 metric_key_prefix == \"train\"
497 and self.args.eval_steps_on_train_set > 0
498 and step + 1 > self.args.eval_steps_on_train_set
499 ):
500 break
--> 502 loss, preds, labels, outputs = self.prediction_step(
503 model,
504 inputs,
505 prediction_loss_only,
506 ignore_keys=ignore_keys,
507 testing=testing,
508 )
510 # Updates metrics
511 # TODO: compute metrics each N eval_steps to speedup evaluation
512 metrics_results_detailed = None
File /anaconda/envs/t4rec/lib/python3.10/site-packages/transformers4rec/torch/trainer.py:363, in Trainer.prediction_step(self, model, inputs, prediction_loss_only, ignore_keys, training, testing)
361 inputs, targets = inputs
362 with torch.no_grad():
--> 363 if self._use_cuda_amp:
364 with autocast():
365 outputs = model(inputs, targets=targets, training=training, testing=testing)
File /anaconda/envs/t4rec/lib/python3.10/site-packages/transformers4rec/torch/trainer.py:400, in Trainer._use_cuda_amp(self)
398 return self.use_cuda_amp
399 except AttributeError:
--> 400 return self.use_amp
AttributeError: 'Trainer' object has no attribute 'use_amp'"
this might be related to Transformers version. Please check all the requirements and install the libs accordingly.
closing due to low interaction.
Bug description
I'm trying to run the example notebook with synthetic data getting-started-session-based
and I'm getting the following error
Steps/Code to reproduce bug
any idea what could be wrong?
Environment details
Additional context