Closed lendle closed 1 year ago
@lendle thanks for reporting that. we'll take a look shortly.
@lendle I cannot reproduce the first error msg you shared coming from examples/tutorial/03-Session-based-recsys.ipynb, section "3.2.4 Train XLNET with Side Information for Next Item Prediction" . Please note that we already fixed the dtype of the product_recency_days_log_norm-list_seq
created in the prior 02-ETL notebook as float32. We do it that way in the notebook: price_log = ['price'] >> nvt.ops.LogOp() >> nvt.ops.Normalize(out_dtype=np.float32) >> nvt.ops.Rename(name='price_log_norm')
you might want to use merlin-pytorch:22.12
docker image for the recent changes, or just fix the line above in your 02-ETL nb.
for the second bug, we'll fix that. thanks.
closing due to lack of activity.
Bug description
Bug 1 examples/tutorial/03-Session-based-recsys.ipynb, section "3.2.4 Train XLNET with Side Information for Next Item Prediction" , the cell that runs training fails.
Log with stack trace
``` ***** Running training ***** Num examples = 112128 Num Epochs = 3 Instantaneous batch size per device = 256 Total train batch size (w. parallel, distributed & accumulation) = 256 Gradient Accumulation steps = 1 Total optimization steps = 1314 --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) FileI believe this is because the
product_recency_days_log_norm-list_seq
created in the prior notebook (02-ETL-with-NVTabular) is float64 rather than float32. I was able to get things to run by adding>> nvt.ops.ReduceDtypeSize()
to the cell where that feature is defined in the prior notebook, section 5.3. I'm not sure if this is the correct fix though.Bug 2
XLNet-MLM with side information accuracy results
that get written to results.txt in 03-Session-based-recsys should have metric name and values separated by:
rather than space. Metrics from the other two models trained in the notebook are written correctly. This causes the call tocreate_bar_chart('results.txt')
to fail.Easy fix,
should have
f.write('%s:%s\n' % (key, value.item()))
in the last line.Steps/Code to reproduce bug
Run the tutorial notebooks.
Expected behavior
Environment details
Google Cloud Workbench managed notebook with image version
nvcr.io/nvidia/merlin/merlin-pytorch:22.11
Machine info: a2-highgpu-1g (Accelerator Optimized: 1 NVIDIA Tesla A100 GPU, 12 vCPUs, 85GB RAM)
I'm using version of the example notebooks that are available in the image.
nvcr.io/nvidia/merlin/merlin-pytorch:22.11
, Machine type a2-highgpu-1g (Accelerator Optimized: 1 NVIDIA Tesla A100 GPU, 12 vCPUs, 85GB RAM),Additional context