Closed sararb closed 1 year ago
https://nvidia-merlin.github.io/Transformers4Rec/review/pr-690
Hi, nice feature! Do you anticipate this PR to be merged and available for use with your docker files soon? I would like to test this feature within T4R for this competition. I'm not sure if NVIDIA's leadership team is already using the text embeds. 😄
Thanks and good job!
I have managed to pass the embeddings to the trainer, but I'm not sure how to map the embedding IDs to the article_ids since they have gone through the Categorify operation and it's not a 1-1 mapping. Any ideas or help? I could use an OrdinalEncoder outside of NVTabular since I am not very familiar with it, so I could avoid the Categorify operation.
Here's the code I used to pass the embeddings to the trainer:
I also had to modify some internals in the Transformers4Rec trainer:
@gaceladri I'm glad to hear that you're interested in using the new feature in the Transformers4Rec (T4R) library for the KDD competition. Please note that this feature is currently a work in progress, but we plan to include it in the upcoming release.
Regarding your question on how to map embedding IDs to article IDs after the Categorify operation, we are currently working on aligning the output of Categorify with the pre-trained embeddings matrix.
In the meantime, I prepared a workaround solution based on your code:
I hope this helps! If you have any further questions, feel free to ask.
Hi, cool feature indeed. Is there an example about how to serve the pytorch model with pretrained embedding on Triton server? I tried to follow the example from your tf repo. Here is my code:
model_input_dict_from_dataloaer = next(iter(dataloader))[0]
# changes output schema from "next items" to "items scores" and "items ids"
topk = 5
model.top_k = topk
model.eval()
print(model.input_schema)
print(model.output_schema)
traced_model = torch.jit.trace(model, model_input_dict_from_dataloaer, strict=True)
ens_model_path = os.environ.get("ens_model_path", "ens_models_standard_workflow_with_pretrained_embeddings")
os.mkdir(ens_model_path)
workflow = nvt.Workflow.load(workflow_path)
torch_op = workflow.input_schema.column_names >> TransformWorkflow(workflow) >> embeddings_op >> PredictPyTorch(
traced_model, model.input_schema, model.output_schema)
ensemble = Ensemble(torch_op, workflow.input_schema)
ens_config, node_configs = ensemble.export(ens_model_path)
print(ens_config)
print(node_configs)
However I got errors when I start the inference using the exported graph. Any hints? Thanks!
Hi, I'm new with T4Rec and I am trying to use pretrained embeddings for this competition: https://recsys.eb.dk/. Is there a notebook on how to do that?
Would also like a notebook on how to do this. Would be super helpful.
@MitchDonley @ArthurHochedez hello. we do not have enough bandwidth to add new notebooks or new features at this moment. If you would like to contribute, you are more than welcome to create an example notebook PR.
Fixes #683 Fixes #684 Fixes #685 Fixes #485
Goals :soccer:
PretrainedEmbeddingFeature
that processes pre-trained input features. #485MerlinDataLoader
(which is a wrapper of the merlin torch loader class)