automl / jahs_bench_201

The first collection of surrogate benchmarks for Joint Architecture and Hyperparameter Search.
https://automl.github.io/jahs_bench_201/
MIT License
15 stars 7 forks source link

Termination in colorectal_histology due to memory overflow #12

Open nabenabe0928 opened 1 year ago

nabenabe0928 commented 1 year ago

As mentioned in the title, colorectal_histology is terminated while both cifar10 and fashion-mnist work. It seems only colorectal_histology requires 16+GB RAM in loading the surrogate benchmark.

Environment:

My code:

import os

import jahs_bench

DATA_DIR = f"{os.environ['HOME']}/tabular_benchmarks/jahs_bench_data/"

tasks = ["colorectal_histology", "cifar10", "fashion_mnist"]
benchmark = jahs_bench.Benchmark(task=tasks[0], download=False, save_dir=DATA_DIR)

config = benchmark.sample_config(random_state=42)
results = benchmark(config, nepochs=200)

print(config)
print(results)

Output (Termination happens only in colorectal_histology)

[00:22:56] WARNING: ../src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
[00:22:56] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
[00:23:08] WARNING: ../src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
[00:23:08] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
[00:23:12] WARNING: ../src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
[00:23:12] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
[00:23:16] WARNING: ../src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
[00:23:16] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
[00:23:25] WARNING: ../src/gbm/gbtree.cc:386: Loading from a raw memory buffer on CPU only machine.  Changing tree_method to hist.
[00:23:25] WARNING: ../src/learner.cc:223: No visible GPU is found, setting `gpu_id` to -1
Killed
nabenabe0928 commented 1 year ago

In my env, colorectal_histology required about 20GB to load the surrogate benchmark. (After loading, the memory usage dropped by 8GB.)

nabenabe0928 commented 1 year ago

Now I noticed that the doc-string says metrics arguments reduces the memory requirements significantly. Could you please add the information in README.md as it will be useful for future users?

NOTE The memory bottleneck seems to be from valid-acc in colorectal histology, so it does not solve my issue though.

NeoChaos12 commented 1 year ago

Thank you for pointing this out! We are aware of the memory issues and pushed a workaround in b626cf54ec481a6a5208a1b6033741c3bfb8a188. Have you tried using the lazy flag? In my own tests, this pushed the peak memory requirements of the benchmark down to below 4GB at the expense of query time performance and additional disk reads, even when using multiple values for metrics.