NVIDIA-Merlin / Merlin

NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Apache License 2.0
758 stars 113 forks source link

[BUG] The multi-stage example is broken due to recent changes in systems #1042

Closed rnyak closed 1 year ago

rnyak commented 1 year ago

Bug description

We are getting the error below due to some changes in faiss op in Systems from this unit test:


E               testbook.exceptions.TestbookRuntimeError: An error occurred while executing the following cell:
E               ------------------
E               from merlin.systems.dag.ops.faiss import QueryFaiss, setup_faiss 
E               
E               item_embeddings = np.ascontiguousarray(
E                   pd.read_parquet(os.path.join(BASE_DIR, "item_embeddings.parquet")).to_numpy()
E               )
E               setup_faiss(item_embeddings, faiss_index_path)
E               ------------------
E               
E               
E               ---------------------------------------------------------------------------
E               IndexError                                Traceback (most recent call last)
E               Cell In[10], line 6
E                     1 from merlin.systems.dag.ops.faiss import QueryFaiss, setup_faiss 
E                     3 item_embeddings = np.ascontiguousarray(
E                     4     pd.read_parquet(os.path.join(BASE_DIR, "item_embeddings.parquet")).to_numpy()
E                     5 )
E               ----> 6 setup_faiss(item_embeddings, faiss_index_path)
E               
E               File /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/ops/faiss.py:209, in setup_faiss(item_vector, output_path, metric, item_id_column, embedding_column)
E                   192 def setup_faiss(
E                   193     item_vector: DataFrameLike,
E                   194     output_path: str,
E                  (...)
E                   197     embedding_column="embedding",
E                   198 ):
E                   199     """
E                   200     Utiltiy function that will create a Faiss index from a set of embedding vectors
E                   201 
E                  (...)
E                   207         target output path
E                   208     """
E               --> 209     ids = item_vector[item_id_column].to_numpy().astype(np.int64)
E                   210     item_vectors = np.ascontiguousarray(
E                   211         np.stack(item_vector[embedding_column].to_numpy()).astype(np.float32)
E                   212     )
E                   214     index = faiss.index_factory(item_vectors.shape[1], "IVF32,Flat", metric)
E               
E               IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

/usr/local/lib/python3.8/dist-packages/testbook/client.py:135: TestbookRuntimeError

Steps/Code to reproduce bug

You can run this unit test to repro the error.

Expected behavior

Environment details

Additional context

There is an open PR in Merlin to update the building_deploying_multi_stage_RecSys example, hopefully that might address this issue.

karlhigley commented 1 year ago

@rnyak The change that led to this failure was motivated by the updates to the multi-stage notebook, and #1022 contains corresponding updates to make use of the changes to setup_faiss