gmontamat / gentun

Hyperparameter tuning for machine learning models using a distributed genetic algorithm
Apache License 2.0
83 stars 22 forks source link

Which version of RabbitMQ is compatible with this code? #17

Closed ahmadmobeen closed 4 weeks ago

ahmadmobeen commented 4 years ago

Hi, thanks for the code. I am trying to run the distributed training however, I have got below error:

Using TensorFlow backend.
Initializing a random population. Size: 20
Starting genetic algorithm...

Evaluating generation #1...
Traceback (most recent call last):
  File "/media/vip/Program/mobeen/gentun/tests/mnist_server.py", line 23, in <module>
    ga.run(50)
  File "/media/vip/Program/mobeen/gentun/gentun/algorithms.py", line 29, in run
    self.evolve_population()
  File "/media/vip/Program/mobeen/gentun/gentun/algorithms.py", line 72, in evolve_population
    fittest = self.population.get_fittest()
  File "/media/vip/Program/mobeen/gentun/gentun/server.py", line 105, in get_fittest
    self.evaluate_in_parallel()
  File "/media/vip/Program/mobeen/gentun/gentun/server.py", line 113, in evaluate_in_parallel
    RpcClient(None, None, **self.credentials).purge()
  File "/media/vip/Program/mobeen/gentun/gentun/server.py", line 31, in __init__
    result = self.channel.queue_declare(exclusive=True)
TypeError: queue_declare() missing 1 required positional argument: 'queue'

Process finished with exit code 1

I solved this error as suggested here giving an empty queue. After this I got another error as below:

Using TensorFlow backend.
Initializing a random population. Size: 20
Starting genetic algorithm...

Evaluating generation #1...
Traceback (most recent call last):
  File "/media/vip/Program/mobeen/gentun/tests/mnist_server.py", line 23, in <module>
    ga.run(50)
  File "/media/vip/Program/mobeen/gentun/gentun/algorithms.py", line 29, in run
    self.evolve_population()
  File "/media/vip/Program/mobeen/gentun/gentun/algorithms.py", line 72, in evolve_population
    fittest = self.population.get_fittest()
  File "/media/vip/Program/mobeen/gentun/gentun/server.py", line 105, in get_fittest
    self.evaluate_in_parallel()
  File "/media/vip/Program/mobeen/gentun/gentun/server.py", line 113, in evaluate_in_parallel
    RpcClient(None, None, **self.credentials).purge()
  File "/media/vip/Program/mobeen/gentun/gentun/server.py", line 33, in __init__
    self.channel.basic_consume(self.on_response, no_ack=True, queue=self.callback_queue)
TypeError: basic_consume() got an unexpected keyword argument 'no_ack'

Process finished with exit code 1

By the feel of these errors I think I am using some other version of RabbitMQ. I installed RabbitMQ using: sudo apt-get install rabbitmq-server

Is it a version mismatch or am I missing something?

gmontamat commented 4 years ago

hi! thanks for using this code. I'll further investigate this. It could be that pika==1.0.1 is not compatible with the latest rabbitmq-server version. Anyways I'm also working in a dockerized implementation of the server code too so that users don't have to worry about installing rabbitmq.

gmontamat commented 4 years ago

@ahmadmobeen I was able to replicate your problem, thanks for reporting it. It's due to pika updating its API. Please use pika==1.1.0 (which should be compatible with your rabbitmq version) and the dev branch code of client.py and server.py. I will merge with master sometime later.

edit: I'm also working on the dockerfile for the gentun server to simplify running the distributed version of the genetic algorithm.

ahmadmobeen commented 4 years ago

@gmontamat Thank you for your reply. I was able to run the code by doing some small changes in server.py and client.py as shown here

ahmadmobeen commented 4 years ago

@ahmadmobeen I was able to replicate your problem, thanks for reporting it. It's due to pika updating its API. Please use pika==1.1.0 (which should be compatible with your rabbitmq version) and the dev branch code of client.py and server.py. I will merge with master sometime later.

edit: I'm also working on the dockerfile for the gentun server to simplify running the distributed version of the genetic algorithm.

I tried this method and it worked for generation # 1.

@gmontamat Thank you for your reply. I was able to run the code by doing some small changes in server.py and client.py as shown here I tried this and it worked for generation # 1.

got the following error in both cases:


Using TensorFlow backend.
Initializing a random population. Size: 20
Starting genetic algorithm...

Evaluating generation #1...
 [*] Got fitness for individual 0
 [*] Got fitness for individual 2
 [*] Got fitness for individual 1
 [*] Got fitness for individual 4
 [*] Got fitness for individual 3
 [*] Got fitness for individual 6
 [*] Got fitness for individual 8
 [*] Got fitness for individual 10
 [*] Got fitness for individual 5
 [*] Got fitness for individual 9
 [*] Got fitness for individual 12
 [*] Got fitness for individual 11
 [*] Got fitness for individual 7
 [*] Got fitness for individual 13
 [*] Got fitness for individual 16
 [*] Got fitness for individual 15
 [*] Got fitness for individual 17
 [*] Got fitness for individual 14
 [*] Got fitness for individual 19
 [*] Got fitness for individual 18
Fittest individual is:
{'S_1': '011', 'S_2': '1000010100'}
Fitness value is: 0.9978

Evaluating generation #2...
Traceback (most recent call last):
  File "/media/vip/Program/mobeen/gentun/tests/mnist_server.py", line 23, in <module>
    ga.run(50)
  File "/media/vip/Program/mobeen/gentun/gentun/algorithms.py", line 29, in run
    self.evolve_population()
  File "/media/vip/Program/mobeen/gentun/gentun/algorithms.py", line 72, in evolve_population
    fittest = self.population.get_fittest()
  File "/media/vip/Program/mobeen/gentun/gentun/server.py", line 105, in get_fittest
    self.evaluate_in_parallel()
  File "/media/vip/Program/mobeen/gentun/gentun/server.py", line 113, in evaluate_in_parallel
    RpcClient(None, None, **self.credentials).purge()
  File "/media/vip/Program/mobeen/gentun/gentun/server.py", line 28, in __init__
    self.connection = pika.BlockingConnection(self.parameters)
  File "/home/vip/anaconda3/envs/keras/lib/python3.7/site-packages/pika-1.1.0-py3.7.egg/pika/adapters/blocking_connection.py", line 359, in __init__
    self._impl = self._create_connection(parameters, _impl_class)
  File "/home/vip/anaconda3/envs/keras/lib/python3.7/site-packages/pika-1.1.0-py3.7.egg/pika/adapters/blocking_connection.py", line 450, in _create_connection
    raise self._reap_last_connection_workflow_error(error)
pika.exceptions.ProbableAuthenticationError: ConnectionClosedByBroker: (403) 'ACCESS_REFUSED - Login was refused using authentication mechanism PLAIN. For details see the broker logfile.'

Process finished with exit code 1
gmontamat commented 4 weeks ago

18 is fixed now. Additionally, we now use redis instead of RabbitMQ because it's easier to setup in docker and the latter being overkill for being used just as a simple message queue system.