[BUG] AttributeError: 'list' object has no attribute 'output_node'"

mrcmoresi commented 6 months ago

Bug description

I'm trying to run the example notebook with synthetic data getting-started-session-based

and I'm getting the following error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[21], line 57
     53 dataset = nvt.Dataset(df)
     55 # Generate statistics for the features and export parquet files
     56 # this step will generate the schema file
---> 57 workflow.fit_transform(dataset).to_parquet(os.path.join(INPUT_DATA_DIR, \"processed_nvt\"))

File /anaconda/envs/t4rec/lib/python3.10/site-packages/nvtabular/workflow/workflow.py:264, in Workflow.fit_transform(self, dataset)
    244 def fit_transform(self, dataset: Dataset) -> Dataset:
    245     \"\"\"Convenience method to both fit the workflow and transform the dataset in a single
    246     call. Equivalent to calling ``workflow.fit(dataset)`` followed by
    247     ``workflow.transform(dataset)``
   (...)
    262     transform
    263     \"\"\"
--> 264     self.fit(dataset)
    265     return self.transform(dataset)

File /anaconda/envs/t4rec/lib/python3.10/site-packages/nvtabular/workflow/workflow.py:228, in Workflow.fit(self, dataset)
    224 if not current_phase:
    225     # this shouldn't happen, but lets not infinite loop just in case
    226     raise RuntimeError(\"failed to find dependency-free StatOperator to fit\")
--> 228 self.executor.fit(ddf, current_phase)
    230 # Remove all the operators we processed in this phase, and remove
    231 # from the dependencies of other ops too
    232 for node in current_phase:

File /anaconda/envs/t4rec/lib/python3.10/site-packages/merlin/dag/executors.py:439, in DaskExecutor.fit(self, dataset, graph, refit)
    437 def fit(self, dataset: Dataset, graph: Graph, refit=True):
    438     if refit:
--> 439         clear_stats(graph)
    441     if not graph.output_schema:
    442         graph.construct_schema(dataset.schema)

File /anaconda/envs/t4rec/lib/python3.10/site-packages/merlin/dag/executors.py:562, in clear_stats(graph)
    555 def clear_stats(graph):
    556     \"\"\"Removes calculated statistics from each node in the workflow graph
    557 
    558     See Also
    559     --------
    560     nvtabular.ops.stat_operator.StatOperator.clear
    561     \"\"\"
--> 562     for stat in Graph.get_nodes_by_op_type([graph.output_node], StatOperator):
    563         stat.op.clear()

AttributeError: 'list' object has no attribute 'output_node'"

Steps/Code to reproduce bug

I created the environment using conda using this
cloned the repo and just run it

any idea what could be wrong?

Environment details

Transformers4Rec version: 23.4.0+3.g911355f4
Platform: x86_64 GNU/Linux
Python version: 3.10.13
Huggingface Transformers version:
PyTorch version (GPU?): 2.1.2
Tensorflow version (GPU?): not installed

Additional context

rnyak commented 6 months ago

@mrcmoresi can you pls use 23.08 version of the repos? you can use our docker image which is recommended.. nvcr.io/nvidia/merlin/merlin-pytorch:23.08

you can then rerun the example. if you still have any issue you can share here.

mrcmoresi commented 6 months ago

Hi @rnyak thanks for your answer. I installed Transformers4Rec and NVtabular 23.08 and now I'm getting a different error when I'm trying to run the following cell

start_time_window_index = start_window_index
final_time_window_index = final_window_index
#Iterating over days of one week
for time_index in range(start_time_window_index, final_time_window_index):
    # Set data 
    time_index_train = time_index
    time_index_eval = time_index + 1
    train_paths = glob.glob(os.path.join(OUTPUT_DIR, f"{time_index_train}/train.parquet"))
    eval_paths = glob.glob(os.path.join(OUTPUT_DIR, f"{time_index_eval}/valid.parquet"))
    print(train_paths)

    # Train on day related to time_index 
    print('*'*20)
    print("Launch training for day %s are:" %time_index)
    print('*'*20 + '\n')
    trainer.train_dataset_or_path = train_paths
    trainer.reset_lr_scheduler()
    trainer.train()
    trainer.state.global_step +=1
    print('finished')

    # Evaluate on the following day
    trainer.eval_dataset_or_path = eval_paths
    train_metrics = trainer.evaluate(metric_key_prefix='eval')
    print('*'*20)
    print("Eval results for day %s are:\t" %time_index_eval)
    print('\n' + '*'*20 + '\n')
    for key in sorted(train_metrics.keys()):
        print(" %s = %s" % (key, str(train_metrics[key]))) 
    wipe_memory()

I'm getting the following error

--------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File /anaconda/envs/t4rec/lib/python3.10/site-packages/transformers4rec/torch/trainer.py:398, in Trainer._use_cuda_amp(self)
    397 try:
--> 398     return self.use_cuda_amp
    399 except AttributeError:

AttributeError: 'Trainer' object has no attribute 'use_cuda_amp'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In[14], line 24
     22 # Evaluate on the following day
     23 trainer.eval_dataset_or_path = eval_paths
---> 24 train_metrics = trainer.evaluate(metric_key_prefix='eval')
     25 print('*'*20)
     26 print(\"Eval results for day %s are:\\t\" %time_index_eval)

File /anaconda/envs/t4rec/lib/python3.10/site-packages/transformers/trainer.py:3085, in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   3082 start_time = time.time()
   3084 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3085 output = eval_loop(
   3086     eval_dataloader,
   3087     description=\"Evaluation\",
   3088     # No point gathering the predictions if there are no metrics, otherwise we defer to
   3089     # self.args.prediction_loss_only
   3090     prediction_loss_only=True if self.compute_metrics is None else None,
   3091     ignore_keys=ignore_keys,
   3092     metric_key_prefix=metric_key_prefix,
   3093 )
   3095 total_batch_size = self.args.eval_batch_size * self.args.world_size
   3096 if f\"{metric_key_prefix}_jit_compilation_time\" in output.metrics:

File /anaconda/envs/t4rec/lib/python3.10/site-packages/transformers4rec/torch/trainer.py:502, in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
    495 if (
    496     metric_key_prefix == \"train\"
    497     and self.args.eval_steps_on_train_set > 0
    498     and step + 1 > self.args.eval_steps_on_train_set
    499 ):
    500     break
--> 502 loss, preds, labels, outputs = self.prediction_step(
    503     model,
    504     inputs,
    505     prediction_loss_only,
    506     ignore_keys=ignore_keys,
    507     testing=testing,
    508 )
    510 # Updates metrics
    511 # TODO: compute metrics each N eval_steps to speedup evaluation
    512 metrics_results_detailed = None

File /anaconda/envs/t4rec/lib/python3.10/site-packages/transformers4rec/torch/trainer.py:363, in Trainer.prediction_step(self, model, inputs, prediction_loss_only, ignore_keys, training, testing)
    361 inputs, targets = inputs
    362 with torch.no_grad():
--> 363     if self._use_cuda_amp:
    364         with autocast():
    365             outputs = model(inputs, targets=targets, training=training, testing=testing)

File /anaconda/envs/t4rec/lib/python3.10/site-packages/transformers4rec/torch/trainer.py:400, in Trainer._use_cuda_amp(self)
    398     return self.use_cuda_amp
    399 except AttributeError:
--> 400     return self.use_amp

AttributeError: 'Trainer' object has no attribute 'use_amp'"

rnyak commented 6 months ago

this might be related to Transformers version. Please check all the requirements and install the libs accordingly.

rnyak commented 1 week ago

closing due to low interaction.

NVIDIA-Merlin / Transformers4Rec