EpistasisLab / Aliro

Aliro: AI-Driven Data Science
https://epistasislab.github.io/Aliro
GNU General Public License v3.0
225 stars 63 forks source link

Multi-machine not working in production #644

Open jay-m-dev opened 8 months ago

jay-m-dev commented 8 months ago

Description Multi-machine works with the configuration in docker-compose-multi-machine.yml (development) but not with the configuration in release/docker-compose-hub-image.yml (production). In production the machines are started but they are not connected to the lab. Possible causes of this issue:

  1. The configured port in hub-image.yml (MACHINE_PORT=5081) needs to match the exposed docker ports.
  2. The hub-image.yml file is missing the MACHINE_HOST var (as in the multi-machine.yml file)
  3. A combination of 1 and 2

These possibilities need to be troubleshooted. hub-image.yml needs to be updated with the correct parameters.

Steps to reproduce

  1. Download the latest Aliro-*.zip (v0.21.1) from the GitHub releases page.
  2. Uncomment the lines for "additional machine containers"
  3. Run docker compose up
  4. The multiple machines are started, but Aliro only submits experiments to the first machine.