Open Azilyss opened 10 months ago
@Azilyss this is done because then we can train DL models with ragged inputs, and then serve the on Triton accordingly. Is using pad=True
does not set the is_ragged to False?
Apologies, the outputs are actually the correct ones.
However, because the inputs are expected to be ragged, the parameters item_id-list_seq_offsets, item_id-listseqvalues are created for the reasons you mentioned. In my current setup, I am running Triton inference on a single request at a time, not batched requests. So I was wondering if it was possible to keep the input as is, without having to pad the training dataset before fitting the workflow.
Thank you for your help.
Can I ignore the is_ragged property of the categorical features when exporting the Workflow ?
Setup : nvtabular version : 23.6.0 merlin-systems version : 23.6.0
The NvTabular workflow is defined as follows :
The dataset typically has sequences of items of different length and the workflow slice and pads them to the specified sequence_length.
The workflow is exported as follows:
When exporting the workflow using the Ensemble module, the NvTabular triton config file creates two parameters for each ragged feature: "feature_name_offsets" and "featurenamevalues" for both the inputs and outputs.
Is there a solution to avoid creating these new parameters and keep the inputs as is ? Any workaround appreciated.
Code to reproduce
```python import dask.dataframe as dd import nvtabular as nvt import pandas as pd from merlin.schema import Tags from merlin.systems.dag import Ensemble from merlin.systems.dag.ops.workflow import TransformWorkflow from nvtabular import ColumnSelector tmp_path = "tmp" d = { "item_id-list": [ [28, 12, 44], [12, 28, 73], [24, 35, 6, 12], [74, 28, 9, 12, 44], [101, 102, 103, 104, 105], ], } df = pd.DataFrame(data=d) ddf = dd.from_pandas(df, npartitions=1) train_set = nvt.Dataset(ddf) input_features = ["item_id-list"] max_len = 20 cat_features = ( ColumnSelector(input_features) >> nvt.ops.Categorify() >> nvt.ops.AddMetadata(tags=[Tags.CATEGORICAL]) ) seq_feats_list = ( cat_features["item_id-list"] >> nvt.ops.ListSlice(-max_len, pad=True, pad_value=0) >> nvt.ops.Rename(postfix="_seq") >> nvt.ops.AddMetadata(tags=[Tags.LIST]) ) features = seq_feats_list >> nvt.ops.AddMetadata(tags=[Tags.ITEM, Tags.ID]) workflow = nvt.Workflow(features) workflow.fit(train_set) transform_workflow_op = workflow.input_schema.column_names >> TransformWorkflow(workflow) ensemble = Ensemble(transform_workflow_op, workflow.input_schema) ens_config, node_configs = ensemble.export(tmp_path) print(ens_config) ```