Homepage doesn't load for JuliaBox instance installed on AWS

IanButterworth commented 8 years ago

I've followed the install instructions for JuliaBox on a basic AWS ubuntu instance, but the site doesn't load successfully. The FQDN is accessible (and the juliabox favicon loads) but the html only contains the following:

<html>
    <head>
        <script>
            parent.JuliaBox.inform_logged_out();
        </script>
    </head>
</html>

IanButterworth commented 8 years ago

I should add that I added google oauth for authentication

tanmaykm commented 8 years ago

If you don't even see the index.html, it means that the engine (tornado) is not reachable from webserver (nginx). The FQDN must be accessible from all places, including from within the docker containers.

IanButterworth commented 8 years ago

engineinteractive_err.log and the other two logs are giving me:

Traceback (most recent call last):
  File "/jboxengine/src/jbapi.py", line 17, in <module>
    JBoxCfg.read(conf_file, user_conf_file)
  File "/jboxengine/src/juliabox/jbox_util.py", line 186, in read
    with open(arg) as f:
IOError: [Errno 2] No such file or directory: '/jboxengine/conf/tornado.conf'

tanmaykm commented 8 years ago

jbox_configure.sh generates tornado.conf. You possibly forgot to run it?

IanButterworth commented 8 years ago

I did run the script [edit: including as sudo] in the install sequence, and just tried again and tornado.conf hasn't been created.

I wondered if that comma at the end of the example google auth settings might be rogue, so I tried without but the file still isn't created.

{
    "numdisksmax" : 30, # max disks (more than sessions to allow for transitions)
    "admin_users" : ['admin@gmail.com'],  # administrator email id
    "websocket_protocol" : "ws",
    "interactive": {
        "numlocalmax": 20  # max concurrent users to support
    },
    "plugins": [
        "juliabox.plugins.compute_singlenode",
        "juliabox.plugins.vol_loopback",
        "juliabox.plugins.vol_defpkg",
        "juliabox.plugins.auth_google",
        "juliabox.plugins.db_sqlite3"
    ],
    "google_oauth": {
        "key": "replace with google oauth key",
        "secret": "replace with google oauth secret"
    },
}

IanButterworth commented 8 years ago

Curiously I tried copying the template tornado.conf file from ~/JuliaBox/engine/conf/tornado.conf to /jboxengine/conf/tornado.conf and I still get the following error being created in engineinteractive_err.log

Traceback (most recent call last):
  File "/jboxengine/src/jbox.py", line 13, in <module>
    JBoxCfg.read(conf_file, user_conf_file)
  File "/jboxengine/src/juliabox/jbox_util.py", line 186, in read
    with open(arg) as f:
IOError: [Errno 2] No such file or directory: '/jboxengine/conf/tornado.conf'

IanButterworth commented 8 years ago

Probing inside jbox_configure.sh the variable $ENGINE_CONF_DIR (dictates the save location for tornado.conf) is returning as ~/JuliaBox/engine/conf

There seems to be a location mismatch, but I don't know why the above didn't solve it..

IanButterworth commented 8 years ago

I just tried a fresh install, but modified jbox_configure.sh to save tornado.conf to /jboxengine/conf

I now get a different error with the same html result as the original post.

Traceback (most recent call last):
  File "/jboxengine/src/jbox.py", line 16, in <module>
    JBox().run()
  File "/jboxengine/src/juliabox/srvr_jbox.py", line 31, in __init__
    VolMgr.configure()
  File "/jboxengine/src/juliabox/vol/volmgr.py", line 18, in configure
    JBoxVol.configure()
  File "/jboxengine/src/juliabox/vol/jbox_volume.py", line 128, in configure
    plugin.configure()
  File "/jboxengine/src/juliabox/plugins/vol_loopback/loopback.py", line 26, in configure
    JBoxLoopbackVol.refresh_disk_use_status()
  File "/jboxengine/src/juliabox/plugins/vol_loopback/loopback.py", line 58, in refresh_disk_use_status
    container_id_list = [cdesc['Id'] for cdesc in SessContainer.session_containers(allcontainers=True)]
  File "/jboxengine/src/juliabox/jbox_container.py", line 86, in session_containers
    name = c["Names"][0] if (("Names" in c) and (c["Names"] is not None)) else c["Id"][0:12]
IndexError: list index out of range

tanmaykm commented 8 years ago

@ianshmean All configuration files are packaged into the docker containers and referred from within. Only jbox.user is (optionally) loaded from the host.

I think there was no need to modify $ENGINE_CONF_DIR. Instead repackaging the docker images with img_create.sh jbox would have helped.

IanButterworth commented 8 years ago

I'm definitely not particularly knowledgeable of how things fit together here @tanmaykm, apologies for the wild attempts.

When do you recommend using img_create.sh jbox. I currently do that during step 4 in the installation process.

tanmaykm commented 8 years ago

Yes, that's the step. You run that again have the webserver and engine configurations packaged/repackaged into the docker images.

tanmaykm commented 8 years ago

@ianshmean assuming this is working now. We can continue discussions if not.

samuelpowell commented 8 years ago

I can recreate this problem on AWS and locally.

Upon browsing to a newly installed server, the browser received the data reported by the OP. The webserver/logs/error.log log shows:

2015/12/24 14:40:49 [error] 5#0: *1 connect() failed (111: Connection refused), client: 128.16.114.4, server: , request: "GET / HTTP/1.1", host: "mooncalf.medphys.ucl.ac.uk"
2015/12/24 14:40:50 [error] 5#0: *1 connect() failed (111: Connection refused), client: 128.16.114.4, server: , request: "GET / HTTP/1.1", host: "mooncalf.medphys.ucl.ac.uk"
2015/12/24 14:40:51 [warn] 5#0: *1 [lua] router.lua:223: check_forward_addr(): replacing inaccessible forward address http://127.0.0.1:8888 with http://127.0.0.1:8888, client: 128.16.114.4, server: , request: "GET / HTTP/1.1", host: "mooncalf.medphys.ucl.ac.uk"
2015/12/24 14:40:51 [error] 5#0: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 128.16.114.4, server: , request: "GET / HTTP/1.1", upstream: "http://127.0.0.1:8888/", host: "mooncalf.medphys.ucl.ac.uk"

The connection cannot be made because, as indicated byhost/run/supervisord.log, the engineinteractive container is failing to start:

2015-12-24 14:40:35,099 CRIT Supervisor running as root (no user in config file)
2015-12-24 14:40:35,142 INFO RPC interface 'supervisor' initialized
2015-12-24 14:40:35,142 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2015-12-24 14:40:35,143 INFO daemonizing the supervisord process
2015-12-24 14:40:35,143 INFO set current directory: '/home/samuelpowell/JuliaBox/host'
2015-12-24 14:40:35,144 INFO supervisord started with pid 10029
2015-12-24 14:40:35,286 INFO spawned: 'webserver' with pid 10032
2015-12-24 14:40:35,289 INFO spawned: 'engineapi' with pid 10033
2015-12-24 14:40:35,292 INFO spawned: 'enginedaemon' with pid 10034
2015-12-24 14:40:35,295 INFO spawned: 'engineinteractive' with pid 10036
2015-12-24 14:40:35,440 INFO exited: engineinteractive (exit status 1; not expected)
2015-12-24 14:40:37,176 INFO success: webserver entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-12-24 14:40:37,176 INFO success: engineapi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-12-24 14:40:37,176 INFO success: enginedaemon entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2015-12-24 14:40:37,178 INFO spawned: 'engineinteractive' with pid 10093
2015-12-24 14:40:37,263 INFO exited: engineinteractive (exit status 1; not expected)
2015-12-24 14:40:39,268 INFO spawned: 'engineinteractive' with pid 10097
2015-12-24 14:40:39,355 INFO exited: engineinteractive (exit status 1; not expected)
2015-12-24 14:40:42,362 INFO spawned: 'engineinteractive' with pid 10101
2015-12-24 14:40:42,450 INFO exited: engineinteractive (exit status 1; not expected)
2015-12-24 14:40:43,452 INFO gave up: engineinteractive entered FATAL state, too many start retries too quickly

The root cause is indicated by /engine/logs/engineinteractive.log:

Error response from daemon: Could not find container for entity id 817d14ae30e00b0b1550ec69e1c6f25f28160260d7454532d6031fc8e7713245
Error response from daemon: Could not find container for entity id 817d14ae30e00b0b1550ec69e1c6f25f28160260d7454532d6031fc8e7713245
Error response from daemon: Could not find container for entity id 817d14ae30e00b0b1550ec69e1c6f25f28160260d7454532d6031fc8e7713245
Error response from daemon: Could not find container for entity id 817d14ae30e00b0b1550ec69e1c6f25f28160260d7454532d6031fc8e7713245

I do not know enough about Docker to understand what's going on here - but as noted previously, it can/will result from following our installation instructions.

Any ideas?

IanButterworth commented 8 years ago

Thanks for attempting and detailing @samuelpowell

samuelpowell commented 8 years ago

@ianshmean @tanmaykm, following some unrelated notes on various web-pages, I have managed to overcome the problem locally by running the following:

JuliaBox/scripts/run/stop.sh
sudo service docker stop
sudo mv /var/lib/docker/linkgraph.db linkgraph.old
sudo service docker start
JuliaBox/scripts/run/start.sh

I will check this on AWS shortly. I do not know why this is necessary, nor how (un-)safe it is.

tanmaykm commented 8 years ago

Thanks @samuelpowell. Yes, that did look like a corruption of docker images.

I did face that once and fixed it by doing a clean build of all images. That is, I deleted /var/lib/docker after shutting down docker and re-built containers from scratch.

But maybe the linkgraph.db was all that needed replacing. Not sure why that happened though, and I did not face it after the rebuild.

samuelpowell commented 8 years ago

As perhaps expected, this also fixes installation on AWS.

@tanmaykm would you like me to add a FAQ to the installation document, noting this, for example?

tanmaykm commented 8 years ago

Sure. That would be useful for others I think. Thanks.

JuliaCloud / JuliaBox

Homepage doesn't load for JuliaBox instance installed on AWS #343