frandallfarmer / neohabitat

Repository for the NeoClassical Habitat Server Project
http://neohabitat.org
MIT License
229 stars 41 forks source link

Docker deployment broken #401

Closed StuBlad closed 1 year ago

StuBlad commented 1 year ago

The Docker deployment for NeoHabitat is currently broken. We've been trying to troubleshoot it for a month or two now with varying degrees of success.

Relevant info

Relevant issues to our recent work are: #390 #392

Several PR's were made during this time to update the deployment and get things working again. The first of these commits starts with commit e7c220d.

During this time, we were able to get things operational again and the fork over at https://github.com/AmandaJonesAway/neohabitat successfully works, but somehow with the PR's that got merged, the master repository is still in a broken state. It should be noted that AmandaJonesAway's fork is 15 commits behind master if comparing against it.

Setup

Make sure you have the NeoHabitat installer to test with as it contains a copy of VICE, and the Habitat disk images. You can grab a copy here: https://github.com/StuBlad/neohabitat-installer/releases. If you don't want to use the installer, the disk images are available in the same repository.

Here are the commands I'm using so you can recreate the environment and replicate the broken build.

Ubuntu 22.04 (LTS) x64

Install Docker using the apt repository as per instructions here (https://docs.docker.com/engine/install/ubuntu/)
git clone https://github.com/frandallfarmer/neohabitat.git
cd neohabitat
sudo docker compose up

It's not building the schema and filling the object database in this current state, so you'll need to modify line 46 in docker-compose.yml to true (NEOHABITAT_SHOULD_UPDATE_SCHEMA) and line 49 in run to true (SHOULD_UPDATE_SCHEMA="${NEOHABITAT_SHOULD_UPDATE_SCHEMA:=true}") before you run sudo docker compose up.

You'll only want these set to true for the first time launching this NeoHabitat docker setup to populate the object database. Things will load much faster if you set those configuration options back to false after.

It'll take a few minutes for all the services to come online after bringing the containers up, but when you see the following log output then you are ready to connect to the server:

neohabitat-neohabitat-1         | May 08, 2023 7:03:36 PM com.mongodb.diagnostics.logging.JULLogger log
neohabitat-neohabitat-1         | INFO: Opened connection [connectionId{localValue:2, serverValue:10}] to neohabitatmongo:27017
neohabitat-neohabitat-1         | - 2023/05/08 19:03:36.334 EVN cont : loading classDesc 'classes'
neohabitat-neohabitat-1         | - 2023/05/08 19:03:36.473 EVN cont : loading static object list 'statics'
neohabitat-neohabitat-1         | - 2023/05/08 19:03:36.515 EVN cont : loading static object 'teleports' as 'teleports'

Go into VICE and Settings > RS232 Settings > RS232 Settings. Set RS232 Device 1 to the IP of the NeoHabitat server and port 1986 (127.0.0.1:1986) for example.

Load Habitat-Boot.d64. Press return on the title screen. Type in the name you want for your avatar and press return. When on the main Habitat title screen, press Alt-N and any key and watch the server output.

The actual problem

The title screen should freeze and you'll see this output in the logs:

neohabitat-neohabitat-1         | debug: Habitat connection from 172.18.0.4:1337
neohabitat-qlink-1              | 2023-05-08 19:06:19,191 [Thread-5] DEBUG org.jbrain.qlink.connection.QConnection  - Sending Habitat Packet: 5A A1 4A B1 49 13 10 20 55 E0 00 03 32 30 30 00
neohabitat-qlink-1              | 2023-05-08 19:06:19,194 [Thread-6] INFO  org.jbrain.qlink.connection.HabitatConnection  - Initiating Habitat Reader thread
neohabitat-neohabitat-1         | events.js:165
neohabitat-neohabitat-1         |       throw er; // Unhandled 'error' event
neohabitat-neohabitat-1         |       ^
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Error: connect ECONNREFUSED 172.18.0.4:2018
neohabitat-neohabitat-1         |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1174:14)
neohabitat-neohabitat-1         | Emitted 'error' event at:
neohabitat-neohabitat-1         |     at emitErrorNT (internal/streams/destroy.js:64:8)
neohabitat-neohabitat-1         |     at process._tickCallback (internal/process/next_tick.js:178:19)
neohabitat-neohabitat-1         | Program node /neohabitat/bridge/Habitat2ElkoBridge.js -l 0.0.0.0:1337 -e neohabitat:2018 -m neohabitatmongo:27017/elko -c context-Downtown_5f -t debug exited with code 1
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Starting child process with 'node /neohabitat/bridge/Habitat2ElkoBridge.js -l 0.0.0.0:1337 -e neohabitat:2018 -m neohabitatmongo:27017/elko -c context-Downtown_5f -t debug'
neohabitat-neohabitat-1         | Missing/invalid defaults.elko configuration file. Proceeding with factory defaults.
neohabitat-neohabitat-1         | info: Habitat to Elko Bridge listening on 0.0.0.0:1337

According to the file docker-compose.yml, port 2018 is the NEOHABITAT_BRIDGE_ELKO_HOST.

For some reason, the connection is being refused. When this setup previously worked, we used ip addresses instead of giving a container a name and referring to that instead. I'm wondering if this is what has caused the issue here. We need this port to be accessible and responsive in order to have this Docker setup working again.

kevzettler commented 1 year ago

@StuBlad

The title screen should freeze and you'll see this output in the logs:

what logs? browser logs? or some service logs? looks like neohabitat-neohabitat-1 service logs

Error: connect ECONNREFUSED 172.18.0.4:2018 port 2018 is the NEOHABITAT_BRIDGE_ELKO_HOST. For some reason, the connection is being refused.

Can you connect to this port manually with telnet?

I would manually log into the neohabitat-neohabitat-1 container and then manually test that it can make the connection to 172.18.0.4:2018 within the container.

StuBlad commented 1 year ago

@StuBlad

The title screen should freeze and you'll see this output in the logs:

what logs? browser logs? or some service logs? looks like neohabitat-neohabitat-1 service logs.

That's correct, it's the neohabitat-neohabitat-1 service logs.

Error: connect ECONNREFUSED 172.18.0.4:2018 port 2018 is the NEOHABITAT_BRIDGE_ELKO_HOST. For some reason, the connection is being refused.

Can you connect to this port manually with telnet?

I would manually log into the neohabitat-neohabitat-1 container and then manually test that it can make the connection to 172.18.0.4:2018 within the container.

We tried that previously

[root@9c5a27c4fa26 neohabitat]# curl -v telnet://neohabitat:2018
* About to connect() to neohabitat port 2018 (#0)
*   Trying 172.18.0.4...
* Connection refused
* Failed connect to neohabitat:2018; Connection refused
* Closing connection 0
curl: (7) Failed connect to neohabitat:2018; Connection refused

If you attempt to connect to port 1337 which is what the bridge listens on manually with telnet, you get this:

[root@9c5a27c4fa26 neohabitat]# curl -v telnet://neohabitat:1337
* About to connect() to neohabitat port 1337 (#0)
*   Trying 172.18.0.4...
* Connected to neohabitat (172.18.0.4) port 1337 (#0)
* Closing connection 0

and in the neohabitat-neohabitat-1 container log you get this:

neohabitat-neohabitat-1         | debug: Habitat connection from 172.18.0.4:1337
neohabitat-neohabitat-1         | events.js:165
neohabitat-neohabitat-1         |       throw er; // Unhandled 'error' event
neohabitat-neohabitat-1         |       ^
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Error: connect ECONNREFUSED 172.18.0.4:2018
neohabitat-neohabitat-1         |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1174:14)
neohabitat-neohabitat-1         | Emitted 'error' event at:
neohabitat-neohabitat-1         |     at emitErrorNT (internal/streams/destroy.js:64:8)
neohabitat-neohabitat-1         |     at process._tickCallback (internal/process/next_tick.js:178:19)
neohabitat-neohabitat-1         | Program node /neohabitat/bridge/Habitat2ElkoBridge.js -l 0.0.0.0:1337 -e neohabitat:2018 -m neohabitatmongo:27017/elko -c context-Downtown_5f -t debug exited with code 1
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Starting child process with 'node /neohabitat/bridge/Habitat2ElkoBridge.js -l 0.0.0.0:1337 -e neohabitat:2018 -m neohabitatmongo:27017/elko -c context-Downtown_5f -t debug'
neohabitat-neohabitat-1         | Missing/invalid defaults.elko configuration file. Proceeding with factory defaults.
kevzettler commented 1 year ago

We tried that previously [root@9c5a27c4fa26 neohabitat]# curl -v telnet://neohabitat:2018

  • About to connect() to neohabitat port 2018 (#0)
  • Trying 172.18.0.4...
  • Connection refused
  • Failed connect to neohabitat:2018; Connection refused
  • Closing connection 0 curl: (7) Failed connect to neohabitat:2018; Connection refused

Ok so obviously that confirms the container can't make that connection. What container or service is that 2018 connection supposed spawn from and be listening from? is it active and listening locally? I would confirm that.

Also why are you telneting the domain neohabitat instead of 172.18.0.4 here?

I would also execute docker ps -a and see if the PORTS column has a proper mapping for 2018

If you attempt to connect to port 1337 which is what the bridge listens on manually with telnet, you get this:

Why bring the 1337 port into discussion why is that relevant?

StuBlad commented 1 year ago

Ok so obviously that confirms the container can't make that connection. What container or service is that 2018 connection supposed spawn from and be listening from? is it active and listening locally? I would confirm that.

So the issue here is regarding the bridge. The bridge functions as a NeoHabitat-to-Habitat protocol bridge (converts NeoHabitat messages into binary packets compatible with the QuantumLink protocol). The connection to the bridge is being refused when attempting to login which is where we are stuck.

In the neohabitat-neohabitat-1 container log we see the following output:

neohabitat-neohabitat-1         | info: Habitat to Elko Bridge listening on 0.0.0.0:1337

I'm not entirely sure what's going on, but I suspect that the Bridge listens on 1337 externally and perhaps 2018 is used internally based on the -e option given when starting the Bridge as you'll see below.

As for being active and listening locally, here is the output from ps -aux

root          21  0.0  1.4 974748 29880 ?        Sl   19:18   0:00 node /usr/bin/supervisor -w bridge -- /neohabitat/bridge/Habitat2ElkoBridge.js -l 0.0.0.0:1337 -e neohabitat:2018 -m neohabitatmongo:27017/elko -c context-Downtown_5f -t debug
root          22  0.0  0.0   4380   652 ?        S    19:18   0:00 tee bridge.log
root          47  0.1  2.0 916248 41840 ?        Sl   19:18   0:00 node /neohabitat/bridge/Habitat2ElkoBridge.js -l 0.0.0.0:1337 -e neohabitat:2018 -m neohabitatmongo:27017/elko -c context-Downtown_5f -t debug

Here's the output from netstat -ltnp

[root@8a791b052011 neohabitat]# netstat -ltnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 172.18.0.4:9000         0.0.0.0:*               LISTEN      63/java
tcp        0      0 0.0.0.0:1898            0.0.0.0:*               LISTEN      63/java
tcp        0      0 127.0.0.11:33645        0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:1337            0.0.0.0:*               LISTEN      47/node

Also why are you telneting the domain neohabitat instead of 172.18.0.4 here?

One of the recent changes made was to get rid of the use of localhost/127.0.0.1 and give each docker container a name instead which was used as a drop in replacement for the IP addresses. neohabitat = 172.18.0.4 in this instance. For completeness sake though, here's the output using the ip.

[root@8a791b052011 neohabitat]# curl -v telnet://172.18.0.4:2018
* About to connect() to 172.18.0.4 port 2018 (#0)
*   Trying 172.18.0.4...
* Connection refused
* Failed connect to 172.18.0.4:2018; Connection refused
* Closing connection 0
curl: (7) Failed connect to 172.18.0.4:2018; Connection refused

I would also execute docker ps -a and see if the PORTS column has a proper mapping for 2018

docker ps -a output:

caribe@caribe:~/neohabitat$ docker ps -a
CONTAINER ID   IMAGE                    COMMAND                  CREATED        STATUS          PORTS                                                             NAMES
f6756d968c9d   philcollins/qlink        "/bin/sh -c /usr/loc…"   22 hours ago   Up 14 minutes   0.0.0.0:1986->1986/tcp, :::1986->1986/tcp, 5190/tcp               neohabitat-qlink-1
8a791b052011   philcollins/neohabitat   "/bin/sh -c /neohabi…"   22 hours ago   Up 14 minutes   1337/tcp, 1701/tcp, 1986-1987/tcp, 2018/tcp, 3000/tcp, 9000/tcp   neohabitat-neohabitat-1
7bf2e76ae07e   mongo:4.0                "docker-entrypoint.s…"   25 hours ago   Up 14 minutes   27017/tcp                                                         neohabitat-neohabitatmongo-1
bb5e52d9f7d8   mariadb                  "docker-entrypoint.s…"   25 hours ago   Up 14 minutes   3306/tcp                                                          neohabitat-neohabitatmariadb-1

If you attempt to connect to port 1337 which is what the bridge listens on manually with telnet, you get this:

Why bring the 1337 port into discussion why is that relevant?

See above.

frandallfarmer commented 1 year ago

[New comment replacing old comment.]

The Bridge listens on port 1337 (aka LEET) - it decides, based on content analysis when it gets a message after the connection what type of connection it is. If it starts with 'Z' it is a QLINK MICROCOSM packet, decoded - which the bridge then (generally) translates to JSON/Elko format to send to the server listening on port 9000. NOTE: Port numbers are the defaults in the code, and may not be how a specific instance is configured at runtime.

If the message starts with a '{' it is from a bot or the test framework, which is sending pre-formatted JSON messages for relay to Elko/Neohabitat. There may be special 'bridge-only' message handling.

frandallfarmer commented 1 year ago

Note that defaults.elko in /bridge says:

{"context":"context-Downtown_5f", "user":"user-chip", "listen":"neohabitat:1337", "elko":"neohabitat:2018", "delay":2000, "rate":1200, "logtime":true, "trace":"debug" }

That puts the elko/neohabitat main server at 2018 !

kevzettler commented 1 year ago

Ok I'm fairly confident that this line should not be commented out: https://github.com/frandallfarmer/neohabitat/blob/master/docker-compose.yml#L59

The following output is copied from Stu's post above:

caribe@caribe:~/neohabitat$ docker ps -a
CONTAINER ID   IMAGE                    COMMAND                  CREATED        STATUS          PORTS                                                             NAMES
f6756d968c9d   philcollins/qlink        "/bin/sh -c /usr/loc…"   22 hours ago   Up 14 minutes   0.0.0.0:1986->1986/tcp, :::1986->1986/tcp, 5190/tcp               neohabitat-qlink-1
8a791b052011   philcollins/neohabitat   "/bin/sh -c /neohabi…"   22 hours ago   Up 14 minutes   1337/tcp, 1701/tcp, 1986-1987/tcp, 2018/tcp, 3000/tcp, 9000/tcp   neohabitat-neohabitat-1
7bf2e76ae07e   mongo:4.0                "docker-entrypoint.s…"   25 hours ago   Up 14 minutes   27017/tcp                                                         neohabitat-neohabitatmongo-1
bb5e52d9f7d8   mariadb                  "docker-entrypoint.s…"   25 hours ago   Up 14 minutes   3306/tcp                                                          neohabitat-neohabitatmariadb-1

The ports on the philcollins/neohabitat container are listed but not forwarded. If we uncomment the ports in docker-compose.yml they show up as forwarded

MAGE                    COMMAND                  CREATED         STATUS         PORTS                                                                                                                                                                                                                                                                                                                             NAMES
0029ce017e8e   philcollins/qlink        "/bin/sh -c /usr/loc…"   4 minutes ago   Up 5 seconds   0.0.0.0:1986->1986/tcp, :::1986->1986/tcp, 5190/tcp                                                                                                                                                                                                                                                                               neohabitat_qlink_1
e9c22c70462d   philcollins/neohabitat   "/bin/sh -c /neohabi…"   4 minutes ago   Up 6 seconds   0.0.0.0:1337->1337/tcp, :::1337->1337/tcp, 0.0.0.0:1701->1701/tcp, :::1701->1701/tcp, 0.0.0.0:1898->1898/tcp, :::1898->1898/tcp, 0.0.0.0:1987->1987/tcp, :::1987->1987/tcp, 1986/tcp, 0.0.0.0:2018->2018/tcp, :::2018->2018/tcp, 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp, 0.0.0.0:9229->9229/tcp, :::9229->9229/tcp, 3000/tcp   neohabitat_neohabitat_1
ae32c5679d2f   mariadb                  "docker-entrypoint.s…"   4 minutes ago   Up 7 seconds   3306/tcp                                                                                                                                                                                                                                                                                                                          neohabitat_neohabitatmariadb_1
ea1d8264bbec   mongo:4.0                "docker-entrypoint.s…"   4 minutes ago   Up 7 seconds   27017/tcp                                                                                                                                                                                                                                                                                                                         neohabitat_neohabitatmongo_1

compare the output and notice the -> indicated ports are now forwarded into the containers.

I haven't been able to test this yet i'm not setup with the VICE emulator maybe @StuBlad can try it quick.

StuBlad commented 1 year ago

I tested out Kev's fork to see the changes made in PR #402

I'm constantly getting this spamming up the neohabitat-neohabitat-1 service logs and the server won't load as a result.

neohabitat-neohabitat-1         | Program node /neohabitat/bridge/Habitat2ElkoBridge.js -l 0.0.0.0:1337 -e neohabitat:2018 -m neohabitatmongo:27017/elko -c context-Downtown_5f -t debug exited with code 1
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Starting child process with 'node /neohabitat/bridge/Habitat2ElkoBridge.js -l 0.0.0.0:1337 -e neohabitat:2018 -m neohabitatmongo:27017/elko -c context-Downtown_5f -t debug'
neohabitat-neohabitat-1         | internal/modules/cjs/loader.js:550
neohabitat-neohabitat-1         |     throw err;
neohabitat-neohabitat-1         |     ^
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Error: Cannot find module 'express'
neohabitat-neohabitat-1         |     at Function.Module._resolveFilename (internal/modules/cjs/loader.js:548:15)
neohabitat-neohabitat-1         |     at Function.Module._load (internal/modules/cjs/loader.js:475:25)
neohabitat-neohabitat-1         |     at Module.require (internal/modules/cjs/loader.js:598:17)
neohabitat-neohabitat-1         |     at require (internal/modules/cjs/helpers.js:11:18)
neohabitat-neohabitat-1         |     at Object.<anonymous> (/neohabitat/pushserver/app.js:1:77)
neohabitat-neohabitat-1         |     at Module._compile (internal/modules/cjs/loader.js:654:30)
neohabitat-neohabitat-1         |     at Object.Module._extensions..js (internal/modules/cjs/loader.js:665:10)
neohabitat-neohabitat-1         |     at Module.load (internal/modules/cjs/loader.js:566:32)
neohabitat-neohabitat-1         |     at tryModuleLoad (internal/modules/cjs/loader.js:506:12)
neohabitat-neohabitat-1         |     at Function.Module._load (internal/modules/cjs/loader.js:498:3)
neohabitat-neohabitat-1         | Program node --inspect bin/pushserver exited with code 1
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Starting child process with 'node --inspect bin/pushserver'
neohabitat-neohabitat-1         | Debugger listening on ws://127.0.0.1:9229/d3593973-1987-4f4a-9fe5-313aea213838
neohabitat-neohabitat-1         | For help see https://nodejs.org/en/docs/inspector
neohabitat-neohabitat-1         | internal/modules/cjs/loader.js:550
neohabitat-neohabitat-1         |     throw err;
neohabitat-neohabitat-1         |     ^
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Error: Cannot find module 'winston'
neohabitat-neohabitat-1         |     at Function.Module._resolveFilename (internal/modules/cjs/loader.js:548:15)
neohabitat-neohabitat-1         |     at Function.Module._load (internal/modules/cjs/loader.js:475:25)
neohabitat-neohabitat-1         |     at Module.require (internal/modules/cjs/loader.js:598:17)
neohabitat-neohabitat-1         |     at require (internal/modules/cjs/helpers.js:11:18)
neohabitat-neohabitat-1         |     at Object.<anonymous> (/neohabitat/bridge/Habitat2ElkoBridge.js:19:23)
neohabitat-neohabitat-1         |     at Module._compile (internal/modules/cjs/loader.js:654:30)
neohabitat-neohabitat-1         |     at Object.Module._extensions..js (internal/modules/cjs/loader.js:665:10)
neohabitat-neohabitat-1         |     at Module.load (internal/modules/cjs/loader.js:566:32)
neohabitat-neohabitat-1         |     at tryModuleLoad (internal/modules/cjs/loader.js:506:12)
neohabitat-neohabitat-1         |     at Function.Module._load (internal/modules/cjs/loader.js:498:3)

This sequence keeps repeating constantly to infinity.

I'm guessing this is to do with moving the npm install stuff out of run which I was hoping would help fix another strange issue we had where the package serve-favicon would not install correctly but all other npm packages would. serve-favicon was only used by the pushserver, which is why it was recently disabled in the first place.

StuBlad commented 1 year ago

I tested grabbing master and setting enable pushserver to true in the docker-compose.yml file. Things choked on finding npm modules again which happened previously.

Here's some sample output:

neohabitat-neohabitat-1         | Starting child process with 'node --inspect bin/pushserver'
neohabitat-neohabitat-1         | Debugger listening on ws://127.0.0.1:9229/8355e32c-28fd-4345-a3c5-8772877aee2d
neohabitat-neohabitat-1         | For help see https://nodejs.org/en/docs/inspector
neohabitat-neohabitat-1         | internal/modules/cjs/loader.js:550
neohabitat-neohabitat-1         |     throw err;
neohabitat-neohabitat-1         |     ^
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Error: Cannot find module 'morgan'
neohabitat-neohabitat-1         |     at Function.Module._resolveFilename (internal/modules/cjs/loader.js:548:15)
neohabitat-neohabitat-1         |     at Function.Module._load (internal/modules/cjs/loader.js:475:25)
neohabitat-neohabitat-1         |     at Module.require (internal/modules/cjs/loader.js:598:17)
neohabitat-neohabitat-1         |     at require (internal/modules/cjs/helpers.js:11:18)
neohabitat-neohabitat-1         |     at Object.<anonymous> (/neohabitat/pushserver/app.js:4:14)
neohabitat-neohabitat-1         |     at Module._compile (internal/modules/cjs/loader.js:654:30)
neohabitat-neohabitat-1         |     at Object.Module._extensions..js (internal/modules/cjs/loader.js:665:10)
neohabitat-neohabitat-1         |     at Module.load (internal/modules/cjs/loader.js:566:32)
neohabitat-neohabitat-1         |     at tryModuleLoad (internal/modules/cjs/loader.js:506:12)
neohabitat-neohabitat-1         |     at Function.Module._load (internal/modules/cjs/loader.js:498:3)
neohabitat-neohabitat-1         | Program node --inspect bin/pushserver exited with code 1

I opened a docker shell into neohabitat-neohabitat-1 went to the pushserver dir and ran npm install and the habiproxy and pushserver stood up and i was able to login to my local server.

This is a problem we had previously, so you've definitely diagnosed that the pushserver is vital and needs to be operational here, it's just now we're back to the node module issues we had before. it doesn't make sense why it can't find them on my setup, but it can on yours. If we can resolve this, I think that we'll be good to go.

Also, it's important to note that NEOHABITAT_SHOULD_UPDATE_SCHEMA should probably be always set to false by default, because if you're running this service on The MADE's server, anytime the service gets restarted, everyones data will get wiped out.

kevzettler commented 1 year ago

Very strange I was seeing those pushserver dependency issues and that's what led me to moving the npm install into the Dockerfile. I have confirmed that if I start a new container from the neohabitat image the dependencies are installed in /pushserver/node_modules however if I connect to the already running neohabibtat container the /pushserver/node_modules is missing looking further.

themade@neohabitat:~/neohabitat$ sudo docker run -it --rm --entrypoint ls philcollins/neohabitat /neohabitat/pushserver
README.md  app.js  bin  config.dev.yml  config.prod.yml  config.vagrant.yml  constants  habiproxy  node_modules  package-lock.json  package.json  public  routes  run  views
AmandaJonesAway commented 1 year ago

Ok I'm fairly confident that this line should not be commented out: https://github.com/frandallfarmer/neohabitat/blob/master/docker-compose.yml#L59

Nope. One of the changes I made was to set up a private network in the docker-compose.yml and attach each container to it. This exposes each container on that network to all the others, setting up hostnames as per the container name in the docker-compose.yml. All of a container's ports are automatically accessible to all other containers in that network. The only time you need to set up a port mapping is it you have any external applications/clients that need to connect in. Hence only 1986 being left enabled, as that's the port the c64 clients connect to. The other ports are still accessible from any code running within the containers, as long as you use the hostname and not localhost (as that counts as an external address.) Doing this isolates the services, stopping clashes with anything else running in the server (I already had a mariaDB running, for instance) and/or lets you run multiple instances of neohabitat - just change that external port number for each one.

kevzettler commented 1 year ago

Ok here is confirmation that the pushserver/node_modules dependencies are there at build time but missing at run time :

themade@neohabitat:~/neohabitat$ sudo docker run -it --rm --entrypoint ls philcollins/neohabitat /neohabitat/pushserver
README.md  app.js  bin  config.dev.yml  config.prod.yml  config.vagrant.yml  constants  habiproxy  node_modules  package-lock.json  package.json  public  routes  run  views
themade@neohabitat:~/neohabitat$ sudo docker exec 8cff73229076 ls /neohabitat/pushserver
README.md
app.js
bin
config.dev.yml
config.prod.yml
config.vagrant.yml
constants
habiproxy
package-lock.json
package.json
public
routes
run
views

So something is deleting pushserver/node_modules at runtime :\ ??

kevzettler commented 1 year ago

ok furthermore there seem to be a conflict of dependencies between the bridge and pushserver. They have conflicting version of mongodb and seems there is runtime conflicts on resolving which node_modules to use. Ideally these 2 services would be isolated to separate containers. I think alot of the entropy issue here are from cramming maybe 2-3 services (elko,bridge,pushserver) into the single neohabitat container.

frandallfarmer commented 1 year ago

Could that be because the current team upgraded (partially) to a newer version? This conflict is new.

kevzettler commented 1 year ago

I added some commits to #402 that correct the node dependency issues. There was a volume attached to the neohabitat container that would wipe out all the containers contents with local file system contents causing a disparity between what was installed locally and in the container:

https://github.com/frandallfarmer/neohabitat/blob/master/docker-compose.yml#LL37C1-L38C22

I imagine someone left this in for local development purposes but is unnecessary and detrimental to a deterministic build.

I also separated the package.json hierarchy for bridge and pushserver this seems to resolve the missing node dependencies Stu reported above.

kevzettler commented 1 year ago

Ok apparently there is are some node.js scripts in the DB directory https://github.com/frandallfarmer/neohabitat/blob/master/db/mongohelper.js

I overlooked they are dependent on the same package.json as the bridge app. These scripts are dependent on the root package.json so I have reverted in #404

StuBlad commented 1 year ago

I tested out master which brings in Kev's most recent work.

I'm still not able to login. Here are the container logs:

EDIT: Adding in some earlier logging from the same session incase it helps:

neohabitat-neohabitat-1         |  - Running Neohabitat Elko server with args: org.elkoserver.server.context.ContextServerBoot trace_cont=DEBUG trace_comm=EVENT tracelog_dir=/plz/log/to/stdout tracelog_tag=context conf.listen.host=neohabitat:9000 conf.listen.bind=1337:9000 conf.listen.protocol=tcp conf.comm.jsonstrictness=true conf.context.entrytimeout=999999 conf.context.odb=mongo conf.context.odb.mongo.hostport=neohabitatmongo:27017 conf.context.objstore=org.elkoserver.objdb.store.mongostore.MongoObjectStore conf.context.name=neohabitat conf.context.shutdownpassword=figleaf conf.msgdiagnostics=true
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Running node-supervisor with
neohabitat-neohabitat-1         |   program '/neohabitat/bridge/Habitat2ElkoBridge.js -l 0.0.0.0:1337 -e neohabitat:2018 -m neohabitatmongo:27017/elko -c context-Downtown_5f -t debug'
neohabitat-neohabitat-1         |   --watch 'bridge'
neohabitat-neohabitat-1         |   --extensions 'node,js,0:1337'
neohabitat-neohabitat-1         |   --exec 'node'
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Starting child process with 'node /neohabitat/bridge/Habitat2ElkoBridge.js -l 0.0.0.0:1337 -e neohabitat:2018 -m neohabitatmongo:27017/elko -c context-Downtown_5f -t debug'
neohabitat-neohabitat-1         | Watching directory '/neohabitat/bridge' for changes.
neohabitat-neohabitat-1         | Press rs for restarting the process.
neohabitat-neohabitat-1         | Listening for transport dt_socket at address: 1898
neohabitat-neohabitat-1         | - 2023/05/11 12:47:14.608 WRN trace : The log directory was set to '/plz/log/to/stdout', which is not currently a directory.
neohabitat-neohabitat-1         | - 2023/05/11 12:47:14.767 ERR trace (TraceLogDescriptor.startUsing:TraceLogDescriptor.java:277) Could not open new trace file '/plz/log/to/stdout/context.20230511124714'.
neohabitat-neohabitat-1         | - 2023/05/11 12:47:14.976 NTC server : Elko 2.0.1 Release build  8-Nov-2017 00:00:47 UTC
neohabitat-neohabitat-1         | - 2023/05/11 12:47:14.983 NTC server : Copyright 2016 ElkoServer.org; see LICENSE
neohabitat-neohabitat-1         | - 2023/05/11 12:47:14.991 NTC server : Starting neohabitat
neohabitat-neohabitat-1         | Missing/invalid defaults.elko configuration file. Proceeding with factory defaults.
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | > pushserver@0.1.0 debug /neohabitat/pushserver
neohabitat-neohabitat-1         | > ./node_modules/supervisor/lib/cli-wrapper.js --inspect -i public,views -w bin,constants,habiproxy,routes -- bin/pushserver
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | info: Habitat to Elko Bridge listening on 0.0.0.0:1337
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Running node-supervisor with
neohabitat-neohabitat-1         |   program '--inspect bin/pushserver'
neohabitat-neohabitat-1         |   --watch 'bin,constants,habiproxy,routes'
neohabitat-neohabitat-1         |   --ignore 'public,views'
neohabitat-neohabitat-1         |   --extensions 'node,js'
neohabitat-neohabitat-1         |   --exec 'node'
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Starting child process with 'node --inspect bin/pushserver'
neohabitat-neohabitat-1         | Ignoring directory '/neohabitat/pushserver/public'.
neohabitat-neohabitat-1         | Ignoring directory '/neohabitat/pushserver/views'.
neohabitat-neohabitat-1         | Watching directory '/neohabitat/pushserver/bin' for changes.
neohabitat-neohabitat-1         | Press rs for restarting the process.
neohabitat-neohabitat-1         | Watching directory '/neohabitat/pushserver/constants' for changes.
neohabitat-neohabitat-1         | Press rs for restarting the process.
neohabitat-neohabitat-1         | Watching directory '/neohabitat/pushserver/habiproxy' for changes.
neohabitat-neohabitat-1         | Press rs for restarting the process.
neohabitat-neohabitat-1         | Watching directory '/neohabitat/pushserver/routes' for changes.
neohabitat-neohabitat-1         | Press rs for restarting the process.
neohabitat-neohabitat-1         | Debugger listening on ws://127.0.0.1:9229/e7f8f0ba-700d-4c84-8a9c-6269909caaa5
neohabitat-neohabitat-1         | For help see https://nodejs.org/en/docs/inspector
neohabitat-neohabitat-1         | May 11, 2023 12:47:16 PM com.mongodb.diagnostics.logging.JULLogger log
neohabitat-neohabitat-1         | INFO: Cluster created with settings {hosts=[neohabitatmongo:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=50}
neohabitat-neohabitat-1         | WebSocket settings:
neohabitat-neohabitat-1         |     - proxying from 0.0.0.0:1987 to qlink:1986
neohabitat-neohabitat-1         |     - Running in unencrypted HTTP (ws://) mode
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:17.557+0000 I NETWORK  [listener] connection accepted from 172.19.0.4:43850 #9 (1 connection now open)
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:17.605+0000 I NETWORK  [conn9] received client metadata from 172.19.0.4:43850 conn9: { driver: { name: "mongo-java-driver", version: "unknown" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "5.15.0-67-generic" }, platform: "Java/Oracle Corporation/1.8.0_242-b08" }
neohabitat-neohabitat-1         | May 11, 2023 12:47:17 PM com.mongodb.diagnostics.logging.JULLogger log
neohabitat-neohabitat-1         | INFO: No server chosen by ReadPreferenceServerSelector{readPreference=primary} from cluster description ClusterDescription{type=UNKNOWN, connectionMode=SINGLE, serverDescriptions=[ServerDescription{address=neohabitatmongo:27017, type=UNKNOWN, state=CONNECTING}]}. Waiting for 30000 ms before timing out
neohabitat-neohabitat-1         | May 11, 2023 12:47:17 PM com.mongodb.diagnostics.logging.JULLogger log
neohabitat-neohabitat-1         | INFO: Opened connection [connectionId{localValue:1, serverValue:9}] to neohabitatmongo:27017
neohabitat-neohabitat-1         | - 2023/05/11 12:47:17.776 ERR cont (Server.startOneListener:Server.java:788) unable to open tcp listener conf.listen on requested host: java.net.BindException: Cannot assign requested address
neohabitat-neohabitat-1         | May 11, 2023 12:47:17 PM com.mongodb.diagnostics.logging.JULLogger log
neohabitat-neohabitat-1         | INFO: Monitor thread successfully connected to server with description ServerDescription{address=neohabitatmongo:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 28]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, roundTripTimeNanos=11882938}
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:17.809+0000 I NETWORK  [listener] connection accepted from 172.19.0.4:43858 #10 (2 connections now open)
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:17.820+0000 I NETWORK  [conn10] received client metadata from 172.19.0.4:43858 conn10: { driver: { name: "mongo-java-driver", version: "unknown" }, os: { type: "Linux", name: "Linux", architecture: "amd64", version: "5.15.0-67-generic" }, platform: "Java/Oracle Corporation/1.8.0_242-b08" }
neohabitat-neohabitat-1         | May 11, 2023 12:47:17 PM com.mongodb.diagnostics.logging.JULLogger log
neohabitat-neohabitat-1         | INFO: Opened connection [connectionId{localValue:2, serverValue:10}] to neohabitatmongo:27017
neohabitat-neohabitat-1         | - 2023/05/11 12:47:18.072 EVN cont : loading classDesc 'classes'
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:18.197+0000 I NETWORK  [listener] connection accepted from 172.19.0.4:43870 #11 (3 connections now open)
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:18.211+0000 I NETWORK  [conn11] received client metadata from 172.19.0.4:43870 conn11: { driver: { name: "nodejs", version: "3.0.4" }, os: { type: "Linux", name: "linux", architecture: "x64", version: "5.15.0-67-generic" }, platform: "Node.js v9.11.2, LE, mongodb-core: 3.0.4" }
neohabitat-neohabitat-1         | info: Connected to MongoDB mongodb://neohabitatmongo/elko
neohabitat-neohabitat-1         | info: Habiproxy listening on 0.0.0.0:2018
neohabitat-neohabitat-1         | debug: HTTP server listening on port 1701
neohabitat-neohabitat-1         | - 2023/05/11 12:47:18.353 EVN cont : loading static object list 'statics'
neohabitat-neohabitat-1         | - 2023/05/11 12:47:18.430 EVN cont : loading static object 'teleports' as 'teleports'

Original output:

neohabitat-neohabitat-1         | debug: Habitat connection from 172.19.0.4:1337
neohabitat-qlink-1              | 2023-05-11 12:47:51,211 [Thread-5] DEBUG org.jbrain.qlink.connection.QConnection  - Sending Habitat Packet: 5A A1 4A B1 49 13 10 20 55 E0 00 03 32 30 30 00
neohabitat-qlink-1              | 2023-05-11 12:47:51,214 [Thread-6] INFO  org.jbrain.qlink.connection.HabitatConnection  - Initiating Habitat Reader thread
neohabitat-neohabitat-1         | debug: Habiproxy client connected at: 172.19.0.4:52356
neohabitat-neohabitat-1         | events.js:165
neohabitat-neohabitat-1         |       throw er; // Unhandled 'error' event
neohabitat-neohabitat-1         |       ^
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Error: connect ECONNREFUSED 172.19.0.4:9000
neohabitat-neohabitat-1         |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1174:14)
neohabitat-neohabitat-1         | Emitted 'error' event at:
neohabitat-neohabitat-1         |     at emitErrorNT (internal/streams/destroy.js:64:8)
neohabitat-neohabitat-1         |     at process._tickCallback (internal/process/next_tick.js:178:19)
neohabitat-neohabitat-1         | debug: Connecting: 172.19.0.4:1337 <-> 172.19.0.4:52356
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:51.229+0000 I NETWORK  [conn11] end connection 172.19.0.4:43870 (2 connections now open)
neohabitat-neohabitat-1         | Program node --inspect bin/pushserver exited with code 1
neohabitat-neohabitat-1         |
neohabitat-neohabitat-1         | Starting child process with 'node --inspect bin/pushserver'
neohabitat-neohabitat-1         | debug: 1 (Habitat Client) connected.
neohabitat-neohabitat-1         | Debugger listening on ws://127.0.0.1:9229/d4c59bda-71a7-45d8-9f0e-b3f6411ab986
neohabitat-neohabitat-1         | debug: client (1) ->  [noid:0 request:3 seq:0 ... [50,48,48,0]]
neohabitat-neohabitat-1         | For help see https://nodejs.org/en/docs/inspector
neohabitat-neohabitat-1         | debug: [85,250,0,3,1,48,66,65,68,32,68,73,83,75] -> client (1)
neohabitat-neohabitat-1         | debug: Elko port disconnected...
neohabitat-neohabitat-1         | debug: {Bridge being shutdown...}
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:51.280+0000 I NETWORK  [listener] connection accepted from 172.19.0.4:47562 #12 (3 connections now open)
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:51.297+0000 I NETWORK  [conn12] received client metadata from 172.19.0.4:47562 conn12: { driver: { name: "nodejs", version: "2.2.36" }, os: { type: "Linux", name: "linux", architecture: "x64", version: "5.15.0-67-generic" }, platform: "Node.js v9.11.2, LE, mongodb-core: 2.1.20" }
neohabitat-neohabitat-1         | debug: Habitat client disconnected.
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:51.347+0000 I NETWORK  [listener] connection accepted from 172.19.0.4:47564 #13 (4 connections now open)
neohabitat-neohabitat-1         | warn: Error [ERR_STREAM_WRITE_AFTER_END]: write after end
neohabitat-neohabitat-1         | debug: User user-stu already has a turf Region assigned: context-OutamyWy_529_interior
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:51.369+0000 I NETWORK  [conn12] end connection 172.19.0.4:47562 (3 connections now open)
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:51.372+0000 I NETWORK  [conn13] end connection 172.19.0.4:47564 (2 connections now open)
neohabitat-neohabitat-1         | warn: Error [ERR_STREAM_WRITE_AFTER_END]: write after end
neohabitat-neohabitat-1         | WebSocket settings:
neohabitat-neohabitat-1         |     - proxying from 0.0.0.0:1987 to qlink:1986
neohabitat-neohabitat-1         |     - Running in unencrypted HTTP (ws://) mode
neohabitat-neohabitat-1         | warn: Error [ERR_STREAM_WRITE_AFTER_END]: write after end
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:51.839+0000 I NETWORK  [listener] connection accepted from 172.19.0.4:47572 #14 (3 connections now open)
neohabitat-neohabitatmongo-1    | 2023-05-11T12:47:51.849+0000 I NETWORK  [conn14] received client metadata from 172.19.0.4:47572 conn14: { driver: { name: "nodejs", version: "3.0.4" }, os: { type: "Linux", name: "linux", architecture: "x64", version: "5.15.0-67-generic" }, platform: "Node.js v9.11.2, LE, mongodb-core: 3.0.4" }
neohabitat-neohabitat-1         | info: Connected to MongoDB mongodb://neohabitatmongo/elko
neohabitat-neohabitat-1         | info: Habiproxy listening on 0.0.0.0:2018
neohabitat-neohabitat-1         | debug: HTTP server listening on port 1701