AnacletoLAB / grape

🍇 GRAPE is a Rust/Python Graph Representation Learning library for Predictions and Evaluations
MIT License
508 stars 37 forks source link

TransE error: "ValueError: One of the provided node embedding computed with the TransE method contains NaN values." #10

Closed realmarcin closed 2 years ago

realmarcin commented 2 years ago

When generating embeddings for KG-Microbe (KGX edge file from KG-Hub) using TransE, the following error was observed:

ValueError Traceback (most recent call last)

in ----> 1 embedding = model.fit_transform(kg) ~/Library/Python/3.7/lib/python/site-packages/cache_decorator/cache.py in wrapped(*args, **kwargs) 595 if not cache_enabled: 596 self.logger.info("The cache is disabled") --> 597 result = function(*args, **kwargs) 598 self._check_return_type_compatability(result, self.cache_path) 599 return result ~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/abstract_embedding_model.py in fit_transform(self, graph, return_dataframe, verbose) 164 graph=graph, 165 return_dataframe=return_dataframe, --> 166 verbose=verbose 167 ) 168 ~/Library/Python/3.7/lib/python/site-packages/embiggen/embedders/ensmallen_embedders/transe.py in _fit_transform(self, graph, return_dataframe, verbose) 112 embedding_method_name=self.model_name(), 113 node_embeddings= node_embedding, --> 114 edge_type_embeddings= edge_type_embedding, 115 ) 116 ~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/embedding_result.py in __init__(self, embedding_method_name, node_embeddings, edge_embeddings, node_type_embeddings, edge_type_embeddings) 76 if np.isnan(numpy_embedding).any(): 77 raise ValueError( ---> 78 f"One of the provided {embedding_list_name} " 79 f"computed with the {embedding_method_name} method " 80 "contains NaN values." ValueError: One of the provided node embedding computed with the TransE method contains NaN values. #### I am attaching a jupyter notebook to reproduce the problem. [load_graph_and.ipynb.zip](https://github.com/AnacletoLAB/grape/files/8902221/load_graph_and.ipynb.zip) The input edge file is here: https://kg-hub.berkeleybop.io/kg-microbe/current/kg-microbe.tar.gz
LucaCappelletti94 commented 2 years ago

Hello Marcin, in the provided Jupyter you are loading the edge list using:

kg = Graph.from_csv(
    edge_path="./merged-kg_edges.tsv",
   sources_column_number=0,
   edge_list_edge_types_column_number=1,
   destinations_column_number=2,
   directed=False,
   name="kg-microbe")

but this will load the id column as source nodes, since the file is not a triples file like the other one.

Schermata 2022-06-14 alle 20 10 40

If you load the graph from the automatic retrieval (which points to the same edge list) you should not encounter any issue:

from grape.datasets.kghub import KGMicrobe
kg = KGMicrobe()

Nonetheless, it is interesting that this causes this peculiar issue, I will look into it.

sanyabt commented 2 years ago

Hi @LucaCappelletti94, I ran into the same issue after running the embeddings on my graph - TransE model run after ntriples file loaded. Here is a screenshot of the graph loading and the error.

Screen Shot 2022-06-14 at 7 28 00 PM

ValueError Traceback (most recent call last) Input In [17], in <cell line: 1>() ----> 1 embedding = model.fit_transform(npkg)

File ~/.conda/envs/faers-embed/lib/python3.8/site-packages/cache_decorator/cache.py:597, in Cache._decorate_function..wrapped(*args, *kwargs) 595 if not cache_enabled: 596 self.logger.info("The cache is disabled") --> 597 result = function(args, **kwargs) 598 self._check_return_type_compatability(result, self.cache_path) 599 return result

File ~/.conda/envs/faers-embed/lib/python3.8/site-packages/embiggen/utils/abstract_models/abstract_embedding_model.py:163, in AbstractEmbeddingModel.fit_transform(self, graph, return_dataframe, verbose) 149 if graph.has_disconnected_nodes(): 150 warnings.warn( 151 ( 152 f"Please be advised that the {graph.get_name()} graph " (...) 160 ) 161 ) --> 163 result = self._fit_transform( 164 graph=graph, 165 return_dataframe=return_dataframe, 166 verbose=verbose 167 ) 169 if not isinstance(result, EmbeddingResult): 170 raise NotImplementedError( 171 f"The embedding result produced by the {self.model_name()} method " 172 f"from the library {self.library_name()} implemented in the class " 173 f"called {self.class.name} does not return an Embeddingresult " 174 f"but returns an object of type {type(result)}." 175 )

File ~/.conda/envs/faers-embed/lib/python3.8/site-packages/embiggen/embedders/ensmallen_embedders/transe.py:111, in TransEEnsmallen._fit_transform(self, graph, return_dataframe, verbose) 102 node_embedding = pd.DataFrame( 103 node_embedding, 104 index=graph.get_node_names() 105 ) 106 edge_type_embedding = pd.DataFrame( 107 edge_type_embedding, 108 index=graph.get_unique_edge_type_names() 109 ) --> 111 return EmbeddingResult( 112 embedding_method_name=self.model_name(), 113 node_embeddings= node_embedding, 114 edge_type_embeddings= edge_type_embedding, 115 )

File ~/.conda/envs/faers-embed/lib /python3.8/site-packages/embiggen/utils/abstract_models/embedding_result.py:77, in EmbeddingResult.init(self, embedding_method_name, node_embeddings, edge_embeddings, node_type_embeddings, edge_type_embeddings) 74 numpy_embedding = embedding 76 if np.isnan(numpy_embedding).any(): ---> 77 raise ValueError( 78 f"One of the provided {embedding_list_name} " 79 f"computed with the {embedding_method_name} method " 80 "contains NaN values." 81 ) 83 self._embedding_method_name = embedding_method_name 84 self._node_embeddings = node_embeddings

ValueError: One of the provided node embedding computed with the TransE method contains NaN values.

LucaCappelletti94 commented 2 years ago

Hello @sanyabt! Fortunately, most likely your error is only caused by the fact that the graph is loaded as direct and there may be trap nodes there. Could you try to run kg.get_trap_nodes_number()? If there are any, that is the cause and I have fixed it yesterday (I had forgotten about this corner case).

LucaCappelletti94 commented 2 years ago

Resolved also the corner case presented in the other peculiar undirected graph topology.

sanyabt commented 2 years ago

Thank you! Do we need to update or reinstall grape for the fix?

LucaCappelletti94 commented 2 years ago

It will be necessary, but currently, @zommiommy is working on @pnrobinson Printer issue. As soon as that is fixed, we will run the build procedure and deploy the updated version on PyPI. I will notify you here when we do so.

We have added in the READMEs links to the telegram, discord and Twitter accounts to easily reach us.

LucaCappelletti94 commented 2 years ago

Deployed updated versions on Pypi, GraPE version 0.1.3.