google-deepmind / graph_nets

Build Graph Nets in Tensorflow
https://arxiv.org/abs/1806.01261
Apache License 2.0
5.34k stars 783 forks source link

Kernel Restart - Incompatibility between nx.draw and utils_tf.data_dicts_to_graphs_tuple #124

Closed mshearer0 closed 4 years ago

mshearer0 commented 4 years ago

Hi.

I'm trying to use nx.draw and utils_tf.data_dicts_to_graphs_tuple in the same TF2 notebook.

Whichever is executed second seems to cause a kernel restart in the notebook which i can't explain. Importing networkx is fine as long as nx.draw is not run.

@Mistobaan - I get this behaviour on your very helpful TF2 version of graph_nets_basic tutorial.

Michael.

alvarosg commented 4 years ago

I have not observed this, not sure if @Mistobaan did.

Are you running on your own kernel, or on Google Colaboratory?

Mistobaan commented 4 years ago

In my experience that is usually an out of memory case. Check the system logs if you are running on Colab.

mshearer0 commented 4 years ago

Hi, thanks. I'm running on GCP Notebook with 15GB RAM. GCP logs show:

Aug 12 21:12:02 ... bash[1278]: OMP: Error #15: Initializing libiomp5.so, but found libomp.so already initialized. Aug 12 21:12:02 ... bash[1278]: OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the progr am. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorr ect results. For more information, please see http://www.intel.com/software/products/support/. Aug 12 21:12:03 ... bash[1278]: [I 21:12:03.530 LabApp] KernelRestarter: restarting kernel (1/5), keep random ports Aug 12 21:12:03 ... bash[1278]: kernel ... restarted

Mistobaan commented 4 years ago

I think the answer is printed by your logs:

set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results
mshearer0 commented 4 years ago

@Mistobaan - yes, I’ve used that as a workaround but wondered if there was a better option?

Mistobaan commented 4 years ago

Get a bigger machine with more memory? Can you replicate the problem into a colab and post the link to the colab? make sure you set the share permissions.

mshearer0 commented 4 years ago

Upgrading to GCP Notebook Tensorflow 2.3 (from 2.2.0) resolved the issue.