google-deepmind / graph_nets

Build Graph Nets in Tensorflow
https://arxiv.org/abs/1806.01261
Apache License 2.0
5.34k stars 783 forks source link

Issue installing dm-tree #108

Closed TianrenWang closed 4 years ago

TianrenWang commented 4 years ago

I am trying to install graph_nets on Compute Canada's HPC, but I keep encountering an out-of-memory error when running the java component of dm-tree installation. The error message is as follows:

Collecting dm-tree
  Using cached https://files.pythonhosted.org/packages/5d/26/cd2b72779f7f448894837c1c11ad3bb73dfdeffa3b77d80235c1ba39fa9f/dm-tree-0.1.2.tar.gz
Requirement already satisfied: six>=1.12.0 in ./tensorflow/lib/python3.6/site-packages (from dm-tree)
Building wheels for collected packages: dm-tree
  Running setup.py bdist_wheel for dm-tree ... error
  Complete output from command /home/kagutaba/tensorflow/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-49nrjxem/dm-tree/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpavtum5ozpip-wheel- --python-tag cp36:
  /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type'
    warnings.warn(msg)
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.6
  creating build/lib.linux-x86_64-3.6/tree
  copying tree/tree_test.py -> build/lib.linux-x86_64-3.6/tree
  copying tree/tree_benchmark.py -> build/lib.linux-x86_64-3.6/tree
  copying tree/__init__.py -> build/lib.linux-x86_64-3.6/tree
  running build_ext
  bazel build //tree:_tree --symlink_prefix=build/temp.linux-x86_64-3.6/bazel- --compilation_mode=opt
  WARNING: Output base '/home/kagutaba/.cache/bazel/_bazel_kagutaba/e504611fda55f1567590116511763dbe' is on NFS. This may lead to surprising failures and undetermined behavior.
  Starting local Bazel server and connecting to it...
  Server crashed during startup. Now printing /home/kagutaba/.cache/bazel/_bazel_kagutaba/e504611fda55f1567590116511763dbe/server/jvm.out
  java.lang.OutOfMemoryError: Metaspace
  Dumping heap to /home/kagutaba/.cache/bazel/_bazel_kagutaba/e504611fda55f1567590116511763dbe/java_pid4419.hprof ...
  Heap dump file created [31633163 bytes in 0.608 secs]

  Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
  error: command 'bazel' failed with exit status 37

  ----------------------------------------
  Failed building wheel for dm-tree

I have no problem installing this on my personal computer, so it could just be an issue with Compute Canada, but has anything similar occurred while installing dm-tree on other HPC? And if there was, how was it fixed? It seems like increasing the heap size doesn't solve the problem either, as I still get the same error after increasing the memory limit.

java -Xms256m -Xmx8g |java -XX:+PrintFlagsFinal -version |grep HeapSize
    uintx ErgoHeapSizeLimit                         = 0                                   {product}
    uintx HeapSizePerGCThread                       = 87241520                            {product}
    uintx InitialHeapSize                          := 2107637760                          {product}
    uintx LargePageHeapSizeThreshold                = 134217728                           {product}
    uintx MaxHeapSize                              := 4294967296                          {product}
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
zafarali commented 4 years ago

If I recall correctly Compute Canada has limited compute on its login nodes and you’re recommended to request a job to do installs. Can you confirm you’re not running this on your login node?

On Tue, Mar 10, 2020 at 6:25 PM Frank Wang notifications@github.com wrote:

I am trying to install graph_nets on Compute Canada's HPC, but I keep encountering an out-of-memory error when running the java component of dm-tree installation. The error message is as follows:

Collecting dm-tree Using cached https://files.pythonhosted.org/packages/5d/26/cd2b72779f7f448894837c1c11ad3bb73dfdeffa3b77d80235c1ba39fa9f/dm-tree-0.1.2.tar.gz Requirement already satisfied: six>=1.12.0 in ./tensorflow/lib/python3.6/site-packages (from dm-tree) Building wheels for collected packages: dm-tree Running setup.py bdist_wheel for dm-tree ... error Complete output from command /home/kagutaba/tensorflow/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-49nrjxem/dm-tree/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /tmp/tmpavtum5ozpip-wheel- --python-tag cp36: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.6.3/lib/python3.6/distutils/dist.py:261: UserWarning: Unknown distribution option: 'long_description_content_type' warnings.warn(msg) running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.6 creating build/lib.linux-x86_64-3.6/tree copying tree/tree_test.py -> build/lib.linux-x86_64-3.6/tree copying tree/tree_benchmark.py -> build/lib.linux-x86_64-3.6/tree copying tree/init.py -> build/lib.linux-x86_64-3.6/tree running build_ext bazel build //tree:_tree --symlink_prefix=build/temp.linux-x86_64-3.6/bazel- --compilation_mode=opt WARNING: Output base '/home/kagutaba/.cache/bazel/_bazel_kagutaba/e504611fda55f1567590116511763dbe' is on NFS. This may lead to surprising failures and undetermined behavior. Starting local Bazel server and connecting to it... Server crashed during startup. Now printing /home/kagutaba/.cache/bazel/_bazel_kagutaba/e504611fda55f1567590116511763dbe/server/jvm.out java.lang.OutOfMemoryError: Metaspace Dumping heap to /home/kagutaba/.cache/bazel/_bazel_kagutaba/e504611fda55f1567590116511763dbe/java_pid4419.hprof ... Heap dump file created [31633163 bytes in 0.608 secs]

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main" error: command 'bazel' failed with exit status 37


Failed building wheel for dm-tree

I have no problem installing this on my personal computer, so it could just be an issue with Compute Canada, but has anything similar occurred while installing dm-tree on other HPC? And if there was, how was it fixed? It seems like increasing the heap size doesn't solve the problem either, as I still get the same error after increasing the memory limit.

java -Xms256m -Xmx8g |java -XX:+PrintFlagsFinal -version |grep HeapSize uintx ErgoHeapSizeLimit = 0 {product} uintx HeapSizePerGCThread = 87241520 {product} uintx InitialHeapSize := 2107637760 {product} uintx LargePageHeapSizeThreshold = 134217728 {product} uintx MaxHeapSize := 4294967296 {product} java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepmind/graph_nets/issues/108?email_source=notifications&email_token=ABQA57FVFFKCWT3FNXBUVRDRG244RA5CNFSM4LFJF3C2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IUBIOWQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQA57A34OMXQPTSQHY2BXTRG244RANCNFSM4LFJF3CQ .

TianrenWang commented 4 years ago

Thank you so much! Installing from my login node was precisely the problem, so I sent an install job instead, and it successfully installed everything.