NVIDIA-Merlin / systems

Merlin Systems provides tools for combining recommendation models with other elements of production recommender systems (like feature stores, nearest neighbor search, and exploration strategies) into end-to-end recommendation pipelines that can be served with Triton Inference Server.
Apache License 2.0
90 stars 30 forks source link

[BUG] getting dtype error when serving an NVT workflow model on TIS with TransformWorkflow op #243

Closed rnyak closed 1 year ago

rnyak commented 1 year ago

Bug description

When I try to serve an NVT model with list columns I get the an error Invalid argument: in ensemble ensemble_model, ensemble tensor category-list__lengths: inconsistent data type: TYPE_INT32 is inferred from model ensemble_model while TYPE_INT64 is inferred from model 0_transformworkflow due to dtype mismatch in the config files of the 0_transformworkflow and ensemble_model.

Steps/Code to reproduce bug

Please run the following code step by step to repro:

I1201 16:53:19.183453 5962 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f956e000000' with size 268435456
I1201 16:53:19.183926 5962 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I1201 16:53:19.186749 5962 model_lifecycle.cc:459] loading: 0_transformworkflow:1
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1201 16:53:23.301905 5962 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.12) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
I1201 16:53:26.073950 5962 model_lifecycle.cc:693] successfully loaded '0_transformworkflow' version 1
E1201 16:53:26.074102 5962 model_repository_manager.cc:481] Invalid argument: in ensemble ensemble_model, ensemble tensor category-list__lengths: inconsistent data type: TYPE_INT32 is inferred from model ensemble_model while TYPE_INT64 is inferred from model 0_transformworkflow
I1201 16:53:26.074150 5962 server.cc:561] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1201 16:53:26.074188 5962 server.cc:588] 
+---------+-------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                  | Config                                                                                                                                        |
+---------+-------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| python  | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-b |
|         |                                                       | atch-size":"4"}}                                                                                                                              |
+---------+-------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+

I1201 16:53:26.074217 5962 server.cc:631] 
+---------------------+---------+--------+
| Model               | Version | Status |
+---------------------+---------+--------+
| 0_transformworkflow | 1       | READY  |
+---------------------+---------+--------+

I1201 16:53:26.108891 5962 metrics.cc:650] Collecting metrics for GPU 0: Quadro GV100
I1201 16:53:26.109204 5962 tritonserver.cc:2214] 
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                          |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                         |
| server_version                   | 2.25.0                                                                                                                                                                         |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data st |
|                                  | atistics trace                                                                                                                                                                 |
| model_repository_path[0]         | /transformers4rec/examples/getting-started-session-based/models/                                                                                                               |
| model_control_mode               | MODE_NONE                                                                                                                                                                      |
| strict_model_config              | 0                                                                                                                                                                              |
| rate_limit                       | OFF                                                                                                                                                                            |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                      |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                       |
| response_cache_byte_size         | 0                                                                                                                                                                              |
| min_supported_compute_capability | 6.0                                                                                                                                                                            |
| strict_readiness                 | 1                                                                                                                                                                              |
| exit_timeout                     | 30                                                                                                                                                                             |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1201 16:53:26.109239 5962 server.cc:262] Waiting for in-flight requests to complete.
I1201 16:53:26.109247 5962 server.cc:278] Timeout 30: Found 0 model versions that have in-flight inferences
I1201 16:53:26.109278 5962 server.cc:293] All models are stopped, unloading models
I1201 16:53:26.109285 5962 server.cc:300] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I1201 16:53:27.109427 5962 server.cc:300] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
I1201 16:53:27.697469 5962 model_lifecycle.cc:578] successfully unloaded '0_transformworkflow' version 1
I1201 16:53:28.109614 5962 server.cc:300] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
  1. Please first run the NVT workflow to fit and save the workflow in this gist: https://gist.github.com/rnyak/ff6a9a4033053ef2a46d46938df2f70b

  2. Then execute the following code in a notebook cell:

    
    import os
    import torch 
    from nvtabular.workflow import Workflow
    from merlin.systems.dag import Ensemble  # noqa
    workflow = Workflow.load('/transformers4rec/examples/getting-started-session-based/workflow_etl/')

from merlin.systems.dag.ops.workflow import TransformWorkflow

serving_ops = ( workflow.input_schema.column_names

TransformWorkflow(workflow) )

ensemble = Ensemble(serving_ops, workflow.input_schema) ens_config, node_configs = ensemble.export('./models')


3. Launch the TIS with the command below on terminal:

`tritonserver --model-repository=<ensemble_export_path>`

### Expected behavior
There shouldnt be dtype mismatch in the config files of the ensemble models.

### Environment details

`merlin-pytorch:22.11` image with all the latest branches pulled.

YOU CAN SEE  that in the CONFIG FILEs DYPES  of `__lengths` columns are not matching. In one file it is `INT64` in the other one it is `INT32`.
rnyak commented 1 year ago

@bschifferer did you encounter this issue as well?

karlhigley commented 1 year ago

I'm a bit confused, because the workflow fitting script doesn't seem to export to the path specified as the input in the ensemble exporting script 😅

rnyak commented 1 year ago

I'm a bit confused, because the workflow fitting script doesn't seem to export to the path specified as the input in the ensemble exporting script sweat_smile

@karlhigley thanks for your PR. appreciate that. Sorry if I confused the paths in the repro example :) But I think u got me :D