autonomio / talos

Hyperparameter Experiments with TensorFlow and Keras
https://autonom.io
MIT License
1.62k stars 268 forks source link

Installing problems #327

Closed tuananhvip closed 5 years ago

tuananhvip commented 5 years ago

I have problems when installing it, all come from versions of requirements packages like matplotlib. I were already update them all, but still errors. plz give me an exactly version of all requirements to install Talos (I will be installed in anaconda envs). Thank you!

after run: !python3.6 -m pip install -U talos

Everythings up-to-date:

Requirement already up-to-date: talos in /home/u/anaconda3/envs/TA/lib/python3.6/site-packages (0.4.9)
Requirement already satisfied, skipping upgrade: numpy in /home/u/anaconda3/envs/TA/lib/python3.6/site-packages (from talos) (1.16.2)
Requirement already satisfied, skipping upgrade: keras in /home/u/anaconda3/envs/TA/lib/python3.6/site-packages (from talos) (2.2.4)
........

then run:

import talos as ta

I got some errors somethings like this:

.....................
NO INTERNET CONNECTION: Reporting plots will not work.  <<<==== Why you need internet ?????
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-27-3bf072900deb> in <module>
----> 1 import talos as ta
...........
----> 5 import matplotlib.pyplot as plt    <<<=== May be errors from here, why I got this error????
      6 from IPython.display import clear_output
      7 
..............
---> 45 from matplotlib.axes._base import _AxesBase, _process_plot_format
     46 
     47 _log = logging.getLogger(__name__)

~/anaconda3/envs/TA/lib/python3.6/site-packages/matplotlib/axes/_base.py in <module>
     41 rcParams = matplotlib.rcParams
     42 
---> 43 is_string_like = cbook.is_string_like
     44 is_sequence_of_strings = cbook.is_sequence_of_strings
     45 

AttributeError: module 'matplotlib.cbook' has no attribute 'is_string_like'

then I down grade of mathplotlib from 2.2.3 ==> 2.2.2 so this requirement with matplotlib should be <= 2.2.2 Solved!

import talos successful, but when I run training the easiest example, i got this error:

t = ta.Scan(x, y, p, diabetes)

.................
0%|          | 0/18 [00:00<?, ?it/s]

InternalError                             Traceback (most recent call last)
<ipython-input-10-5b06afa64c3f> in <module>
----> 1 t = ta.Scan(x, y, p, diabetes)

~/anaconda3/envs/TA/lib/python3.6/site-packages/talos/scan/Scan.py in __init__(self, x, y, params, model, dataset_name, experiment_no, x_val, y_val, val_split, shuffle, round_limit, grid_downsample, random_method, seed, search_method, reduction_method, reduction_interval, reduction_window, reduction_threshold, reduction_metric, reduce_loss, last_epoch_value, clear_tf_session, disable_progress_bar, print_params, debug)
    168         # input parameters section ends
    169 
--> 170         self._null = self.runtime()

.........

~/anaconda3/envs/TA/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg)
    517             None, None,
    518             compat.as_text(c_api.TF_Message(self.status.status)),
--> 519             c_api.TF_GetCode(self.status.status))
    520     # Delete the underlying status object from memory otherwise it stays alive
    521     # as there is a reference to status from this from the traceback due to

InternalError: Blas GEMM launch failed : a.shape=(10, 8), b.shape=(8, 8), m=10, n=8, k=8
     [[Node: dense_1/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/dense_1/MatMul_grad/MatMul_1"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_dense_1_input_0_0/_47, dense_1/kernel/read)]]
     [[Node: metrics/acc/Mean_1/_69 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_485_metrics/acc/Mean_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Solved: Closed all others, exit all running python, restart jupyter then run from beginning!

Running: Hyperparameter Optimization with Keras for the Iris Prediction Now it is running only very little on 1 GPU (on 4GPUs server), even I changed Batch_size from [2,3,4] to [200,300,400]

|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 32%   58C    P2    57W / 250W |  10665MiB / 11178MiB |      9%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 24%   44C    P2    55W / 250W |  10631MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:82:00.0 Off |                  N/A |
| 23%   38C    P2    57W / 250W |  10631MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:83:00.0 Off |                  N/A |
| 23%   41C    P2    57W / 250W |  10631MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

How can I run 80~100% in all 4 GPUs?

after it finished 36/36, it produced error: 100%|██████████| 36/36 [01:10<00:00, 1.89s/it]

...
--> 170         self._null = self.runtime()
...
---> 26     self = scan_finish(self)
...
~/anaconda3/envs/TA/lib/python3.6/site-packages/talos/scan/scan_finish.py in scan_finish(self)
     70 
     71     # convert to numeric
---> 72     self.data = string_cols_to_numeric(self.data)
...
~/anaconda3/envs/TA/lib/python3.6/site-packages/talos/utils/string_cols_to_numeric.py in isnumber(value)
      6 
      7     try:
----> 8         float(value)
      9         return True
     10     except ValueError:

~/anaconda3/envs/TA/lib/python3.6/site-packages/pandas/core/series.py in wrapper(self)
     91             return converter(self.iloc[0])
     92         raise TypeError("cannot convert the series to "
---> 93                         "{0}".format(str(converter)))
     94 
     95     wrapper.__name__ = "__{name}__".format(name=converter.__name__)

TypeError: ("cannot convert the series to <class 'float'>", 'occurred at index loss')

Solved! I edited ~/anaconda3/envs/TA/lib/python3.6/site-packages/talos/utils/string_cols_to_numeric.py from:

    try:
        float(value)
        return True
    except ValueError:
        return False

to:

    try:
        float(value)
        return True
    except :
        return False

I dont know why you all can run, but this way worked!

mikkokotila commented 5 years ago

I think it's better if you can move to v.0.6 by:

pip install git+https://github.com/autonomio/talos@daily-dev

...even though it is still pre-release.

When it comes to GPU utilization, that is coming directly from TensorFlow, and Talos is not changing it in anyway. Iris is a very small problem, so it is expected that GPU utilization is very low. Generally you will see +80% utilization rate with complex and deep networks, and big datasets.

Finally, related with the specific dependencies for <=v0.5, there is the #317 already open for that.

I'm going to close this as there is nothing new to resolve. Feel free to open a new issue if anything else.

shsh88 commented 5 years ago

I tried this but when importing I'm getting:

`NameError Traceback (most recent call last)

in 25 get_ipython().system('pip3 install git+https://github.com/autonomio/talos@daily-dev') 26 import keras_metrics as km ---> 27 import talos 28 29 import itertools /usr/local/lib/python3.6/dist-packages/talos/__init__.py in 32 delattr(sub, key) 33 ---> 34 del commands, scan, model, metrics, key 35 del sub, keep_from_templates, template_sub, warnings 36 NameError: name 'commands' is not defined`
mikkokotila commented 5 years ago

@shsh88 I think you are in a notebook, and you have to restart the notebook kernel and then this error will go away.

prateeksaurabh commented 4 years ago

@shsh88 I think you are in a notebook, and you have to restart the notebook kernel and then this error will go away.

Thanks a lot this helped