Open vs385 opened 1 year ago
@vs385 do you use the notebooks from inside the container or from github?
Hi @bschifferer, I'm using the notebook from inside the container
Hi, just following up here @bschifferer for any help with this issue? Thanks again for looking into this
@vs385 can you please test our latest image? merlin-tensorflow:22.12
thanks.
Pandas version is 1.5.3 and dask_cudf is the one installed in the base image for merlin-tensorflow 22.12
For now I manually updated the file at /usr/local/lib/python3.8/dist-packages/cudf/core/dtypes.py and changed the line from pandas.core.arrays._arrow_utils import ArrowIntervalType to from pandas.core.arrays.arrow.extension_types import ArrowIntervalType
But still getting lots of bugs running nvtabular preprocessing such as:
You have supplied a custom function and Dask is unable to
determine the type of output that that function returns.
To resolve this please provide a meta= keyword.
The docstring of the Dask function you ran should have more information.
Original error is below:
------------------------
AttributeError("'DataFrame' object has no attribute '_meta_nonempty'")```
------------------------------------------
`ModuleNotFoundError Traceback (most recent call last)
Cell In[5], line 15
11 # import seedir as sd
12
13 # External Dependencies
14 import cupy as cp
---> 15 from dask_cudf import read_csv
16 from dask_cuda import LocalCUDACluster
17 from dask.distributed import Client
File /usr/local/lib/python3.8/dist-packages/dask_cudf/__init__.py:5
1 # Copyright (c) 2018-2022, NVIDIA CORPORATION.
3 from dask.dataframe import from_delayed
----> 5 import cudf
6 from cudf._version import get_versions
8 from . import backends
File /usr/local/lib/python3.8/dist-packages/cudf/__init__.py:12
8 from numba import config as numba_config, cuda
10 import rmm
---> 12 from cudf.api.types import dtype
13 from cudf import api, core, datasets, testing
14 from cudf._version import get_versions
File /usr/local/lib/python3.8/dist-packages/cudf/api/__init__.py:3
1 # Copyright (c) 2021, NVIDIA CORPORATION.
----> 3 from cudf.api import extensions, types
5 __all__ = ["extensions", "types"]
File /usr/local/lib/python3.8/dist-packages/cudf/api/types.py:18
15 from pandas.api import types as pd_types
17 import cudf
---> 18 from cudf.core.dtypes import ( # noqa: F401
19 _BaseDtype,
20 dtype,
21 is_categorical_dtype,
22 is_decimal32_dtype,
23 is_decimal64_dtype,
24 is_decimal128_dtype,
25 is_decimal_dtype,
26 is_interval_dtype,
27 is_list_dtype,
28 is_struct_dtype,
29 )
32 def is_numeric_dtype(obj):
33 """Check whether the provided array or dtype is of a numeric dtype.
34
35 Parameters
(...)
43 Whether or not the array or dtype is of a numeric dtype.
44 """
File /usr/local/lib/python3.8/dist-packages/cudf/core/dtypes.py:13
11 from pandas.api import types as pd_types
12 from pandas.api.extensions import ExtensionDtype
---> 13 from pandas.core.arrays._arrow_utils import ArrowIntervalType
14 from pandas.core.dtypes.dtypes import (
15 CategoricalDtype as pd_CategoricalDtype,
16 CategoricalDtypeType as pd_CategoricalDtypeType,
17 )
19 import cudf
ModuleNotFoundError: No module named 'pandas.core.arrays._arrow_utils'`
@vs385 you would not need to manually update any files if you are using merlin docker images, but can you pls tell us where do you run this? what's your HW on ec2 g5 instance? can you print cuda-toolkit version, you can do nvcc --version
? what's driver version you can share nvidia-smi
output? thanks.
I'm running this container from inside an ec2 g5 instance (g5.12xlarge)
nvidia-smi:
NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8
CUDA toolkit version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7R32
Stepping: 0
CPU MHz: 2799.884
BogoMIPS: 5599.76
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-47
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr wbnoinvd arat npt nrip_save rdpid```
@vs385 thanks. May I ask if you could run this example nb and this one instead (dont forget to run the ETL and training notebooks), and see if you can load the models on triton? Please do not fix any file, just use the merlin-tensorflow:22.12
image as it is (you might want to create a clean instance) and write us the error messages you are getting.
Hi @rnyak to run the former, I have to run this one first to train the dlrm model. I'm currently running same as above with just the merlin-tensorflow:22.12 image:
InvalidArgumentError Traceback (most recent call last)
Cell In[1], line 9
6 from merlin.models.utils.example_utils import workflow_fit_transform
7 from merlin.schema.tags import Tags
----> 9 import merlin.models.tf as mm
10 from merlin.io.dataset import Dataset
11 import tensorflow as tf
File /usr/local/lib/python3.8/dist-packages/merlin/models/tf/__init__.py:104
102 from merlin.models.tf.models.base import BaseModel, Model, RetrievalModel, RetrievalModelV2
103 from merlin.models.tf.models.ranking import DCNModel, DeepFMModel, DLRMModel, WideAndDeepModel
--> 104 from merlin.models.tf.models.retrieval import (
105 MatrixFactorizationModel,
106 MatrixFactorizationModelV2,
107 TwoTowerModel,
108 TwoTowerModelV2,
109 YoutubeDNNRetrievalModel,
110 YoutubeDNNRetrievalModelV2,
111 )
112 from merlin.models.tf.outputs.base import ModelOutput
113 from merlin.models.tf.outputs.classification import BinaryOutput, CategoricalOutput
File /usr/local/lib/python3.8/dist-packages/merlin/models/tf/models/retrieval.py:22
20 from merlin.models.tf.prediction_tasks.base import ParallelPredictionBlock, PredictionTask
21 from merlin.models.tf.prediction_tasks.next_item import NextItemPredictionTask
---> 22 from merlin.models.tf.prediction_tasks.retrieval import ItemRetrievalTask
23 from merlin.models.utils.schema_utils import categorical_cardinalities
24 from merlin.schema import Schema, Tags
File /usr/local/lib/python3.8/dist-packages/merlin/models/tf/prediction_tasks/retrieval.py:33
28 from merlin.models.utils import schema_utils
29 from merlin.schema import Schema, Tags
32 @tf.keras.utils.register_keras_serializable(package="merlin_models")
---> 33 class ItemRetrievalTask(MultiClassClassificationTask):
34 """Prediction-task for item-retrieval.
35
36 Parameters
(...)
61 The item retrieval prediction task
62 """
64 DEFAULT_LOSS = "categorical_crossentropy"
File /usr/local/lib/python3.8/dist-packages/merlin/models/tf/prediction_tasks/retrieval.py:65, in ItemRetrievalTask()
34 """Prediction-task for item-retrieval.
35
36 Parameters
(...)
61 The item retrieval prediction task
62 """
64 DEFAULT_LOSS = "categorical_crossentropy"
---> 65 DEFAULT_METRICS = TopKMetricsAggregator.default_metrics(top_ks=[10])
67 def __init__(
68 self,
69 schema: Schema,
(...)
78 **kwargs,
79 ):
80 self.samplers = samplers
File /usr/local/lib/python3.8/dist-packages/merlin/models/tf/metrics/topk.py:483, in TopKMetricsAggregator.default_metrics(cls, top_ks, **kwargs)
481 metrics: List[TopkMetric] = []
482 for k in top_ks:
--> 483 metrics.extend([RecallAt(k), MRRAt(k), NDCGAt(k), AvgPrecisionAt(k), PrecisionAt(k)])
484 # Using Top-k metrics aggregator provides better performance than having top-k
485 # metrics computed separately, as prediction scores are sorted only once for all metrics
486 aggregator = cls(*metrics)
File /usr/local/lib/python3.8/dist-packages/merlin/models/tf/metrics/topk.py:360, in RecallAt.__init__(self, k, pre_sorted, name, **kwargs)
359 def __init__(self, k=10, pre_sorted=False, name="recall_at", **kwargs):
--> 360 super().__init__(recall_at, k=k, pre_sorted=pre_sorted, name=name, **kwargs)
File /usr/local/lib/python3.8/dist-packages/merlin/models/tf/metrics/topk.py:233, in TopkMetric.__init__(self, fn, k, pre_sorted, name, log_base, seed, **kwargs)
231 if name is not None:
232 name = f"{name}_{k}"
--> 233 super().__init__(name=name, **kwargs)
234 self._fn = fn
235 self.k = k
File /usr/local/lib/python3.8/dist-packages/keras/dtensor/utils.py:144, in inject_mesh.<locals>._wrap_function(instance, *args, **kwargs)
142 if mesh is not None:
143 instance._mesh = mesh
--> 144 init_method(instance, *args, **kwargs)
File /usr/local/lib/python3.8/dist-packages/keras/metrics/base_metric.py:622, in Mean.__init__(self, name, dtype)
620 @dtensor_utils.inject_mesh
621 def __init__(self, name="mean", dtype=None):
--> 622 super().__init__(
623 reduction=metrics_utils.Reduction.WEIGHTED_MEAN,
624 name=name,
625 dtype=dtype,
626 )
File /usr/local/lib/python3.8/dist-packages/keras/metrics/base_metric.py:439, in Reduce.__init__(self, reduction, name, dtype)
437 super().__init__(name=name, dtype=dtype)
438 self.reduction = reduction
--> 439 self.total = self.add_weight("total", initializer="zeros")
440 if reduction in [
441 metrics_utils.Reduction.SUM_OVER_BATCH_SIZE,
442 metrics_utils.Reduction.WEIGHTED_MEAN,
443 ]:
444 self.count = self.add_weight("count", initializer="zeros")
File /usr/local/lib/python3.8/dist-packages/keras/metrics/base_metric.py:375, in Metric.add_weight(self, name, shape, aggregation, synchronization, initializer, dtype)
372 additional_kwargs = {}
374 with tf_utils.maybe_init_scope(layer=self):
--> 375 return super().add_weight(
376 name=name,
377 shape=shape,
378 dtype=self._dtype if dtype is None else dtype,
379 trainable=False,
380 initializer=initializer,
381 collections=[],
382 synchronization=synchronization,
383 aggregation=aggregation,
384 **additional_kwargs,
385 )
File /usr/local/lib/python3.8/dist-packages/keras/engine/base_layer.py:705, in Layer.add_weight(self, name, shape, dtype, initializer, regularizer, trainable, constraint, use_resource, synchronization, aggregation, **kwargs)
702 if layout:
703 getter = functools.partial(getter, layout=layout)
--> 705 variable = self._add_variable_with_custom_getter(
706 name=name,
707 shape=shape,
708 # TODO(allenl): a `make_variable` equivalent should be added as a
709 # `Trackable` method.
710 getter=getter,
711 # Manage errors in Layer rather than Trackable.
712 overwrite=True,
713 initializer=initializer,
714 dtype=dtype,
715 constraint=constraint,
716 trainable=trainable,
717 use_resource=use_resource,
718 collections=collections_arg,
719 synchronization=synchronization,
720 aggregation=aggregation,
721 caching_device=caching_device,
722 )
723 if regularizer is not None:
724 # TODO(fchollet): in the future, this should be handled at the
725 # level of variable creation, and weight regularization losses
726 # should be variable attributes.
727 name_in_scope = variable.name[: variable.name.find(":")]
File /usr/local/lib/python3.8/dist-packages/tensorflow/python/trackable/base.py:489, in Trackable._add_variable_with_custom_getter(self, name, shape, dtype, initializer, getter, overwrite, **kwargs_for_getter)
479 if (checkpoint_initializer is not None and
480 not (isinstance(initializer, CheckpointInitialValueCallable) and
481 (initializer.restore_uid > checkpoint_initializer.restore_uid))):
(...)
486 # then we'll catch that when we call _track_trackable. So this is
487 # "best effort" to set the initializer with the highest restore UID.
488 initializer = checkpoint_initializer
--> 489 new_variable = getter(
490 name=name,
491 shape=shape,
492 dtype=dtype,
493 initializer=initializer,
494 **kwargs_for_getter)
496 # If we set an initializer and the variable processed it, tracking will not
497 # assign again. It will add this variable to our dependencies, and if there
498 # is a non-trivial restoration queued, it will handle that. This also
499 # handles slot variables.
500 if not overwrite or isinstance(new_variable, Trackable):
File /usr/local/lib/python3.8/dist-packages/keras/engine/base_layer_utils.py:134, in make_variable(name, shape, dtype, initializer, trainable, caching_device, validate_shape, constraint, use_resource, collections, synchronization, aggregation, partitioner, layout)
127 use_resource = True
129 if layout is None:
130 # In theory, in `use_resource` is True and `collections` is empty
131 # (that is to say, in TF2), we can use tf.Variable.
132 # However, this breaks legacy (Estimator) checkpoints because
133 # it changes variable names. Remove this when V1 is fully deprecated.
--> 134 return tf1.Variable(
135 initial_value=init_val,
136 name=name,
137 trainable=trainable,
138 caching_device=caching_device,
139 dtype=variable_dtype,
140 validate_shape=validate_shape,
141 constraint=constraint,
142 use_resource=use_resource,
143 collections=collections,
144 synchronization=synchronization,
145 aggregation=aggregation,
146 shape=variable_shape if variable_shape else None,
147 )
148 else:
149 return dtensor.DVariable(
150 initial_value=init_val,
151 name=name,
(...)
160 shape=variable_shape if variable_shape else None,
161 )
File /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py:153, in filter_traceback.<locals>.error_handler(*args, **kwargs)
151 except Exception as e:
152 filtered_tb = _process_traceback_frames(e.__traceback__)
--> 153 raise e.with_traceback(filtered_tb) from None
154 finally:
155 del filtered_tb
File /usr/local/lib/python3.8/dist-packages/keras/initializers/initializers_v2.py:171, in Zeros.__call__(self, shape, dtype, **kwargs)
167 if layout:
168 return utils.call_with_layout(
169 tf.zeros, layout, shape=shape, dtype=dtype
170 )
--> 171 return tf.zeros(shape, dtype)
InvalidArgumentError: Device ordinals must be set for all virtual devices or none. But the device_ordinal is specified for 1 while previous devices didn't have any set.
When running the latter (Merlin/tree/main/examples/getting-started-movielens)/04-Triton-Inference-with-TF.ipynb), I get a similar error:
When running the notebook for the tf model (03-Training-with-TF.ipynb), I get an error when running the cell that loads the batch (cell running [10]):
InvalidArgumentError Traceback (most recent call last)
Cell In[10], line 1
----> 1 batch = train_dataset_tf.peek()
2 batch[0]
File /usr/local/lib/python3.8/dist-packages/merlin/dataloader/loader_base.py:286, in LoaderBase.peek(self)
284 def peek(self):
285 """Get the next batch without advancing the iterator."""
--> 286 return self._peek_next_batch()
File /usr/local/lib/python3.8/dist-packages/merlin/dataloader/loader_base.py:308, in LoaderBase._peek_next_batch(self)
306 # get the first chunks
307 if self._batch_itr is None:
--> 308 self._fetch_chunk()
310 # try to iterate through existing batches
311 try:
File /usr/local/lib/python3.8/dist-packages/merlin/dataloader/loader_base.py:298, in LoaderBase._fetch_chunk(self)
296 if isinstance(chunks, Exception):
297 self.stop()
--> 298 raise chunks
299 self._batch_itr = iter(chunks)
File /usr/local/lib/python3.8/dist-packages/merlin/dataloader/loader_base.py:764, in ChunkQueue.load_chunks(self, dev)
762 itr = iter(self.itr)
763 if self.dataloader.device != "cpu":
--> 764 with self.dataloader._get_device_ctx(dev):
765 self.chunk_logic(itr)
766 else:
File /usr/lib/python3.8/contextlib.py:113, in _GeneratorContextManager.__enter__(self)
111 del self.args, self.kwds, self.func
112 try:
--> 113 return next(self.gen)
114 except StopIteration:
115 raise RuntimeError("generator didn't yield") from None
File /usr/local/lib/python3.8/dist-packages/merlin/dataloader/tensorflow.py:181, in Loader._get_device_ctx(self, dev)
170 @contextlib.contextmanager
171 def _get_device_ctx(self, dev):
172 # with tf.device("/device:GPU:{}".format(dev)) as tf_device:
(...)
178 # RuntimeErrors when exiting if two dataloaders
179 # are running at once (e.g. train and validation)
180 if dev != "cpu":
--> 181 yield tf.device("/GPU:" + str(dev))
182 else:
183 # https://www.tensorflow.org/guide/gpu#manual_device_placement
184 yield tf.device("/device:CPU:0")
File /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py:5555, in device_v2(device_name)
5553 if callable(device_name):
5554 raise RuntimeError("tf.device does not support functions.")
-> 5555 return device(device_name)
File /usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py:5504, in device(device_name_or_function)
5500 if callable(device_name_or_function):
5501 raise RuntimeError(
5502 "tf.device does not support functions when eager execution "
5503 "is enabled.")
-> 5504 return context.device(device_name_or_function)
5505 elif executing_eagerly_outside_functions():
5506 @tf_contextlib.contextmanager
5507 def combined(device_name_or_function):
File /usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/context.py:2364, in device(name)
2344 def device(name):
2345 """Context-manager to force placement of operations and Tensors on a device.
2346
2347 Example:
(...)
2362 Context manager for setting the device.
2363 """
-> 2364 ensure_initialized()
2365 return context().device(name)
File /usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/context.py:2159, in ensure_initialized()
2157 def ensure_initialized():
2158 """Initialize the context."""
-> 2159 context().ensure_initialized()
File /usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/context.py:622, in Context.ensure_initialized(self)
618 pywrap_tfe.TFE_ContextOptionsSetRunEagerOpAsFunction(
619 opts, self._run_eager_op_as_function)
620 pywrap_tfe.TFE_ContextOptionsSetJitCompileRewrite(
621 opts, self._jit_compile_rewrite)
--> 622 context_handle = pywrap_tfe.TFE_NewContext(opts)
623 finally:
624 pywrap_tfe.TFE_DeleteContextOptions(opts)
InvalidArgumentError: Device ordinals must be set for all virtual devices or none. But the device_ordinal is specified for 1 while previous devices didn't have any set.
can you add these two lines on top of your notebook please, in the first cell and restart nb again and run the cells pls? let's see if this will solve the issue.
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
Hi @rnyak yes! This solved the issue when running the base 22.12 merlin-tensorflow image Now: thing is we're building off of this image and adding some requirements that we pip install
in requirements.txt
awscli==1.27.1
bokeh==2.1.1
seedir==0.4.0
feast==0.19.4
faiss-gpu==1.7.2
requests==2.28.1
optuna==3.0.4
plotly==5.11.0
jsonschema==4.17.3
then in our Dockerfile we run the following:
COPY requirements.txt requirements.txt
RUN pip install --upgrade pip
RUN cat requirements.txt | xargs -n 1 -L 1 pip install
When I build this image and spin upon a container exactly as I'd do with the merlin-tensorflow:22.12 base image, and say try running an example that has 'import cudf' or 'import dask_cudf', I still keep getting the same error outlined here
Would you know if any of the libraries above pip installed from the requirements would conflict for some reason with the coda-toolkit built into the base 22.12 image? This is very weird
@vs385 I believe now you are good to use merlin-tensorflow:22.12 docker image, but you are getting issues when you install the libs in requirements.txt above. may be you can do the installation one by one and see which one is breaking the cudf?
Then I'd recommend you to escalate this issue in the rapids cudf repo? https://github.com/rapidsai/cudf
--> Installing feast<20.0 creates a conflict with dask (it uninstalls dask==2022.7.1 and installs dask==2022.1.1) --> so I have to reinstall dask==2022.7.1
But to come back to this original problem for which this thread originated, I'm still getting an error when trying to run triton server:
I0201 18:54:02.733370 2064 python_be.cc:1856] TRITONBACKEND_ModelInstanceInitialize: 0_queryfeast (GPU device 1)
2023-02-01 18:54:05.258837: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-01 18:54:06.989881: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.990660: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.991378: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.992097: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.993065: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.993753: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.994454: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.995114: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.995809: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.996470: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.997145: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:06.997807: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I0201 18:54:07.203147 2064 python_be.cc:1856] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)
2023-02-01 18:54:09.729914: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-01 18:54:11.472684: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.473436: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.474147: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.474860: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.475856: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.476551: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.477236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.477892: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.478553: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.479223: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.479911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-01 18:54:11.480589: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
when running tritonserver --model-repository=/Merlin/examples/Building-and-deploying-multi-stage-RecSys/poc_ensemble/ --backend-config=tensorflow,version=2
basically at the queryFaiss step
@vs385 are you running the notebooks as they are? or with your own custom datasets?
please run this and test again:
pip install dask==2022.7.1 distributed==2022.7.1
Hi @rnyak, I tried adding those while running the notebook example, and still get the same issue where it gets stuck at the TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0) step
I0202 16:25:04.567045 3446 python_be.cc:1856] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)
2023-02-02 16:25:07.089032: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-02 16:25:08.815282: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.816043: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.816780: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.817518: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.818501: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.819179: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.819862: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.820560: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.821256: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.821951: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.822636: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-02-02 16:25:08.823297: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
@vs385 I believe you first run pip install dask==2022.7.1 distributed==2022.7.1
first and restarted the kernel right afterwards?
@rnyak - so I ran the notebook as follows: Using the base merlin-tensorflow:22.12 image
This notebook is developed and tested using the latest merlin-tensorflow container from the NVIDIA NGC catalog. To find the tag for the most recently-released container, refer to the Merlin TensorFlow page.
-> I added a cell as follows (as you advised me above)
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
Then I ran the cell
# for running this example on GPU, install the following libraries
%pip install "feast<0.20" faiss-gpu
# for running this example on CPU, uncomment the following lines
# %pip install tensorflow-cpu "feast<0.20" faiss-cpu
# %pip uninstall cudf
with the line %pip install "feast<0.20" faiss-gpu
uncommented
Then I added a new cell and ran it
%pip install dask==2022.7.1 distributed==2022.7.1
I then restarted the kernel and started running the notebook entirely but this time, I did not run the two cells (doing the pip installs)
Then I run the ensemble notebook (#2) and create the poc_ensemble/ and then launch a terminal and run:
tritonserver --model-repository=/Merlin/examples/Building-and-deploying-multi-stage-RecSys/poc_ensemble/ --backend-config=tensorflow,version=2
Now I'm getting the below error:
I0204 19:55:12.556858 2592 pb_stub.cc:245] Failed to initialize Python stub for auto-complete: CUDARuntimeError: cudaErrorInitializationError: initialization error
At:
/usr/local/lib/python3.8/dist-packages/rmm/_cuda/gpu.py(101): getDeviceCount
/usr/local/lib/python3.8/dist-packages/cudf/utils/gpu_utils.py(57): validate_setup
/usr/local/lib/python3.8/dist-packages/cudf/__init__.py(5): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
/usr/local/lib/python3.8/dist-packages/merlin/core/dispatch.py(52): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/dictarray.py(21): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/__init__.py(19): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap>(961): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
/Merlin/examples/Building-and-deploying-multi-stage-RecSys/poc_ensemble/2_queryfaiss/1/model.py(34): <module>
<frozen importlib._bootstrap>(219): _call_with_frames_removed
<frozen importlib._bootstrap_external>(848): exec_module
<frozen importlib._bootstrap>(686): _load_unlocked
<frozen importlib._bootstrap>(975): _find_and_load_unlocked
<frozen importlib._bootstrap>(991): _find_and_load
@vs385 sorry for the inconvenience. I have couple questions:
Is it possible for you to launch a EC2 instance with only a single GPU and test again?
Besides, did you test the two nb examples below again after solving your Device ordinals must be set for all virtual devices or none...
error ? Let's see if you are able to do inference without faiss and feast models.. could you please run these two notebooks in order, and see if inference works for you or not?
Thanks.
@rnyak When you run the second notebook, are you shutting off the first one? Please be sure you have free gpu memory when you run the second.
what's the GPU type and memory? a A10G? (note that since these example nbs run on single GPU, you dont need multiple GPUs).
do you see any model is loaded on triton successfully? do you see READY status on the terminal for any model?
Is it possible for you to launch a EC2 instance with only a single GPU and test again?
Besides, did you test the two nb examples below again after solving your Device ordinals must be set for all virtual devices or none... error ?
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
I was able to run these two nb examples successfully and launch the server :) So there might be an issue with running the ensemble with QueryFaiss and feast functionalities?
Thanks so much for helping with this
@karlhigley @jperez999 any idea on that QueryFaiss cannot be load to Triton Server issue reported above? thanks.
Tried running this notebook example:
When I reach the point to start the server:
tritonserver --model-repository=/ensemble_export_path/ --backend-config=tensorflow,version=2
from the terminal (which I open adjacently in jupyterhub while the notebook is running), the terminal gets stuck and stops loading anything after the below lines:2022-12-07 17:58:42.661940: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-12-07 17:58:42.662552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 19610 MB memory: -> device: 1, name: NVIDIA A10G, pci bus id: 0000:00:1c.0, compute capability: 8.6 2022-12-07 17:58:42.662612: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-12-07 17:58:42.663241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 19610 MB memory: -> device: 2, name: NVIDIA A10G, pci bus id: 0000:00:1d.0, compute capability: 8.6 2022-12-07 17:58:42.663299: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:991] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2022-12-07 17:58:42.663917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 19610 MB memory: -> device: 3, name: NVIDIA A10G, pci bus id: 0000:00:1e.0, compute capability: 8.6 2022-12-07 17:58:42.672242: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle. 2022-12-07 17:58:42.728537: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /Merlin/examples/Building-and-deploying-multi-stage-RecSys/poc_ensemble/1_predicttensorflow/1/model.savedmodel 2022-12-07 17:58:42.752115: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 106507 microseconds. I1207 22:58:42.752287 1485 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: 0_queryfeast (GPU device 1) I1207 22:58:45.055801 1485 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: 2_queryfaiss (GPU device 0)
I'm running on the following: Merlin version: nacre.io/nvidia/merlin/merlin-tensorflow:22.10 Running on an ec2 g5 instance Python version: 3.8.10 Tensorflow version (GPU): tensor flow 2.9.1+nv22.8
Faiss-gpu installed: faiss 1.7.2 faiss-gpu 1.7.2