NREL / OpenStudio-server

The OpenStudio Server is a docker or Helm deployable instance which allows for large-scale parametric analyses of building energy models using the OpenStudio or URBANopt CLIs.
http://www.openstudio.net/
Other
46 stars 20 forks source link

PAT on remote server - "Analysis Error" #500

Closed pkovacs19 closed 4 years ago

pkovacs19 commented 5 years ago

General Issue

I have installed the server on a remote system using the docker-compose method. I am able to connect to the server and run a PAT analysis from one machine, but not another. The machine which errors out, connects just fine, but when I start the analysis, I the green status bar returns "Analysis Error" the project does not appear on the server so I'm not sure to investigate the logs for such a run. If you have any information what might cause a "Analysis Error" such that the project doesn't get created or some directions on where to start investigating logs would be a great help to me.

Error on Remote Server Details

AMI version being used Version of OpenStudio Server: 2.7.2

Server management log file As mentioned in the description, the project/analysis does not get created, so I'm not sure where to check for logs.

Docker Provisioning Error Details

Deployment machine operating system version CentOS 7.6.1810

Docker version

Client:
 Version:           18.09.3
 API version:       1.39
 Go version:        go1.10.8
 Git commit:        774a1f4
 Built:             Thu Feb 28 06:33:21 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.3
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.8
  Git commit:       774a1f4
  Built:            Thu Feb 28 06:02:24 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Openstudio Server version / dockerhub tag used nrel/openstudio-server:latest

docker-compose.yml

Any help is greatly appreciated! Let me know if there is other info needed.

brianlball commented 5 years ago

when you say docker-compose method, are you using docker swarm as detailed here https://github.com/NREL/OpenStudio-server/wiki/Deployment-Using-Docker-Swarm

pkovacs19 commented 5 years ago

I was following the instructions on the README, pasted below:

Install Docker Compose (Version 1.17.0 or greater is required)

Docker compose will be installed on Mac and Windows by default Linux Users: See instructions here Run Docker Compose docker-compose build ... be patient ... If the containers build successfully start them by running docker volume create --name=osdata && docker volume create --name=dbdata && OS_SERVER_NUMBER_OF_WORKERS=4 docker-compose up where 4 is equal to the number of worker nodes you wish to run. For single node servers this should not be greater than the total number of available cores minus 4.

Resetting the containers can be accomplished by running:

docker-compose rm -f docker volume rm osdata dbdata docker volume create --name=osdata docker volume create --name=dbdata OS_SERVER_NUMBER_OF_WORKERS=N docker-compose up docker-compose service scale worker=N

Or one line

docker-compose rm -f && docker-compose build && docker volume rm osdata dbdata && docker volume create --name=osdata && docker volume create --name=dbdata && OS_SERVER_NUMBER_OF_WORKERS=N docker-compose up && docker-compose service scale worker=N Congratulations! Visit http://localhost:8080 to see the OpenStudio Server Management Console.

Edit: I can see the management console just fine when I got to :8080

brianlball commented 5 years ago

Those instructions might be outdated.

I'd suggest trying the swarm method on the wiki page. Also try using the nuke.sh script https://github.com/NREL/OpenStudio-server/blob/develop/local_setup_scripts/nuke.sh with the argument 2.7.2 (if thats the version you want to use). That script will pull the docker containers and stack deploy them. You will probably have to edit it for you file paths, etc, but it will at least show you the commands used to deploy the existing containers. once you achieve that, you can move on to building your own containers if you wish.

pkovacs19 commented 5 years ago

I will try that. If I can see the console and run analysis from another machine but not from another, would that really mean my deployment was all wrong?

brianlball commented 5 years ago

regarding the machine that fails, have you tried submitting a job from a freshly redeployed server?

brianlball commented 5 years ago

also in your PAT project directory, anything in your logs folder?

pkovacs19 commented 5 years ago

So after going through the instruction for docker swarm, I seem to have taken a step back. I can no longer get to the os server dashboard. All of the services seems to start up just fine, but I can no longer go to myaddress.mydomain.com:8080 to see the management console.

Also, on the jobs which would not run, the PAT project is never created on the server, so im not sure where to find logs for it.

brianlball commented 5 years ago

can you post the results of

docker info docker image ls docker service ls

are you using any of the files in the local_setup_scripts directory? did you create a docker registry? did you push the containers to the registry? is your docker-compose.yml file pointing to the registry like the one in the /local_setup_scripts dir? and how many cpu's do you have on your server? whats the value of the enviroment variable OS_SERVER_NUMBER_OF_WORKERS?

pkovacs19 commented 5 years ago

[root@docker local_setup_scripts]# docker info

Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 5
Server Version: 18.09.3
Storage Driver: overlay2
 Backing Filesystem: xfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
 NodeID: sw6acv8pre7l7djgopd66ptzh
 Is Manager: true
 ClusterID: rjynbjqzrmfl2q8vvaz1lkg3y
 Managers: 1
 Nodes: 1
 Default Address Pool: 10.0.0.0/8
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 142.104.75.70
 Manager Addresses:
  142.104.75.70:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: e6b3f5632f50dbc4e9cb6288d911bf4f5e95b18e
runc version: 6635b4f0c6af3810594d2770f662f34ddc15b40d
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-957.10.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.701GiB
Name: docker.engr.uvic.ca
ID: 5IDI:KVYV:NIDK:4IOI:4OQN:T442:MI3W:LHAV:VWRY:YJ33:BRVP:4HEJ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

[root@docker local_setup_scripts]# docker image ls

REPOSITORY                         TAG                 IMAGE ID            CREATED             SIZE
127.0.0.1:5000/openstudio-rserve   latest              ea4262866088        15 hours ago        1.77GB
nrel/openstudio-rserve             latest              ea4262866088        15 hours ago        1.77GB
127.0.0.1:5000/openstudio-server   latest              dc6d9ae21608        16 hours ago        3.15GB
nrel/openstudio-server             latest              dc6d9ae21608        16 hours ago        3.15GB
127.0.0.1:5000/mongo               latest              0fb47b43df19        2 weeks ago         411MB
mongo                              latest              0fb47b43df19        2 weeks ago         411MB
registry                           2.6                 d5ef411ad932        3 months ago        28.5MB
127.0.0.1:5000/redis               latest              1e70071f4af4        18 months ago       107MB
redis                              4.0.6               1e70071f4af4        18 months ago       107MB

[root@docker local_setup_scripts]# docker service ls

ID                  NAME                      MODE                REPLICAS            IMAGE                           PORTS
6gkisyuduztw        osserver_db               replicated          0/1                 mongo:3.4.10                    *:27017->27017/tcp
n26beu5rnxo3        osserver_queue            replicated          0/1                 redis:4.0.6                     *:6379->6379/tcp
lqzklusf0ica        osserver_rserve           replicated          1/1                 nrel/openstudio-rserve:latest
sm4xjdj55074        osserver_web              replicated          0/1                 nrel/openstudio-server:latest   *:80->80/tcp, *:443->443/tcp, *:8080->80/tcp
ngn2q9con9e7        osserver_web-background   replicated          1/1                 nrel/openstudio-server:latest
mvu8a0xj6rkv        osserver_worker           replicated          0/1                 nrel/openstudio-server:latest
re68gydyiyea        registry                  replicated          1/1                 registry:2.6                    *:5000->5000/tcp

Yes, I ran through the nuke.sh script as you suggested. adjusting the file paths to the docker-compose.yml files which i created following the wiki here https://github.com/NREL/OpenStudio-server/wiki/Deployment-Using-Docker-Swarm

I did not create a docker registry.

When I ran the nuke.sh script, it did a bunch of pushing and pulling, is that the registry you're talking about?

I am not sure where this pointing should happen in the yaml file. I simply followed the instruction on the wiki. what line should I be looking at?

There are 2 cores on the server.

OS_SERVER_NUMBER_OF_WORKERS is set at 1

brianlball commented 5 years ago

yep, the nuke script will download and setup the registry for you and then pull and push the containers to it.

hmmm, 2 cpus and 3.7 GB of ram is going to be a challenge to run a server on it. are there any 'reservations' in your docker-compose.yml file?

your 'docker service ls' is showing that several of the services are not starting up. you can 'docker service inspect xxxx' and 'docker service logs xxxx' where xxxx is the id of the service to get more info there.

brianlball commented 5 years ago

the 'image:' part of the yml file has the image tag. the nuke script is looking for images at 127.0.0.1:5000/ which is the registry ip:port.

checkout the docker-compose.yml in the local_server_scripts

pkovacs19 commented 5 years ago

Yeah, there are cpu reservations in the docker-compose.yml I have them set to 1.

The output of 'docker service ls' is very similar to what they show at https://github.com/NREL/OpenStudio-server/wiki/Deployment-Using-Docker-Swarm in step 7, which says "Confirm the stack is running". Is that not what the output should look like if the stack is running?

I changed all the image lines to be one of the following: mongo:3.4.10 redis:4.0.6 nrel/openstudio-server:latest nrel/openstudio-rserve:latest

brianlball commented 5 years ago

with 2 cpus, you'll need to remove the reservations in your docker-compose.yml file. Otherwise, docker wont have the resources to reserve and you'll get startup errors.

the output of 'docker service ls' should have them all 1/1. 0/1 means they have not started up yet or failed to start.

if you are using the registry, you'll need to put 127.0.0.1:5000 in front of the image tag so it knows to pull from the registry. look at the nuke.sh and docker-compose.yml file in the /local_setup_scripts/ directory. they are meant to work together.

pkovacs19 commented 5 years ago

Okay, I will try deploying the stack with the docker-compose.yml file in /local_setup_scripts/. One thing I forgot to mention last time I ran the nuke.sh script was that at the end, it creates the services, increases the worker count then gets hung up, saying "overall progress: 0 out of 42 tasks". If it gets hung there does that mean its not working correctly?

pkovacs19 commented 5 years ago

The output from all the workers looks like

[root@docker local_setup_scripts]# docker service ps osserver_worker
ID                  NAME                 IMAGE                           NODE                DESIRED STATE       CURRENT STATE           ERROR                              PORTS
umira93spwv5        osserver_worker.1    nrel/openstudio-server:latest                       Running             Pending 5 minutes ago   "no suitable node (insufficien…"
zcru8fvey0w2        osserver_worker.2    nrel/openstudio-server:latest                       Running             Pending 5 minutes ago   "no suitable node (insufficien…"
brianlball commented 5 years ago

you'll want to edit the nuke script to just start 1 worker and not 42.

pkovacs19 commented 5 years ago

Thanks. I did that, and I pointed the deploy line to docker-compose.yml in the /local_setup_scipts/. Which seemed to work okay since I can now see the cloud management console when I hit the location in my browser on port 8080, great!

However, I can send a task to the server, but its hung up on queue. It doesnt look like every service has started. The output of 'docker service ls' is:

ID                  NAME                      MODE                REPLICAS            IMAGE                                     PORTS
y7qxd1trm9r4        osserver_db               replicated          1/1                 127.0.0.1:5000/mongo:latest               *:27017->27017/tcp
cwwn0ckxj4q0        osserver_queue            replicated          1/1                 127.0.0.1:5000/redis:latest               *:6379->6379/tcp
fwplpkkw1q72        osserver_rserve           replicated          0/1                 127.0.0.1:5000/openstudio-rserve:latest
yul673c9s2p8        osserver_web              replicated          1/1                 127.0.0.1:5000/openstudio-server:latest   *:80->80/tcp, *:443->443/tcp, *:8080->80/tcp
sy7y8dhtlzcl        osserver_web-background   replicated          1/1                 127.0.0.1:5000/openstudio-server:latest
01k38z94ouy5        osserver_worker           replicated          0/1                 127.0.0.1:5000/openstudio-server:latest
n706c99fshwx        registry                  replicated          1/1                 registry:2.6                              *:5000->5000/tcp

But when I check the logs for the services that show 0/1, there is nothing there (no logs). Would either of those services not running result in the project hanging in the queue?

pkovacs19 commented 5 years ago

Okay, I've got all the services up and running now:

ID                  NAME                      MODE                REPLICAS            IMAGE                                     PORTS
ifkefwex3qn4        osserver_db               replicated          1/1                 127.0.0.1:5000/mongo:latest               *:27017->27017/tcp
oc9bi10cwkwe        osserver_queue            replicated          1/1                 127.0.0.1:5000/redis:latest               *:6379->6379/tcp
smzbzue1ox0e        osserver_rserve           replicated          1/1                 127.0.0.1:5000/openstudio-rserve:latest
y19yjpkp5f5k        osserver_web              replicated          1/1                 127.0.0.1:5000/openstudio-server:latest   *:80->80/tcp, *:443->443/tcp, *:8080->80/tcp
8olmbavkkwct        osserver_web-background   replicated          1/1                 127.0.0.1:5000/openstudio-server:latest
p3xc7fnjy8lf        osserver_worker           replicated          1/1                 127.0.0.1:5000/openstudio-server:latest
ipmtlc2gwqzr        registry                  replicated          1/1                 registry:2.6                              *:5000->5000/tcp

But when I submitted my project to the server, it still is stuck in the queue.

The logs for the project on the server only show a message in the RServe box:

Starting Rserve on port 6311 :
/usr/local/lib/R/bin/R CMD /usr/local/lib/R/library/Rserve/libs//Rserve --vanilla --RS-port 6311 

R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

Any tips on where else to look to for info?

brianlball commented 5 years ago

well, you can 'docker log -f xxx' any of the containers to see their logs.

on the admin page of the web gui, you can also go to the resque dashboard.

is there any analysis showing up for your submitted job?

pkovacs19 commented 5 years ago

I checked the docker container logs, there was nothing on the queue container, I checked them all, but the only one with stuff in it was the web, which showed the HTTP calls, and the DB container, neither of them seemed to have information about the queue.

The Resque dashboard shows that I have pending jobs, the overview tab shows 6 jobs under "analysis_wrapper" and 0 under "failed". It also says "0 of 2 Workers Working" so it seems there are workers available, but they are not picking up the jobs. I can see the jobs in the queue on the "Queues" tab. The 2 workers available have the names (from docker container ls) "osserver_web-background.1.xxxx" and "osserver_worker.1.xxxx" Does that sound right?

There doesn't appear to be any analysis for the submitted jobs. I can go to their analysis page, look at the variable and measures. But there is no data in any of the downloads and the simulations section is showing 0/0.

brianlball commented 5 years ago

hmmm. just for kicks, what does your drive space look like? 'df -h' i've seen on smaller instances where that is almost full on startup.

it also sounds like it could be a networking issue between the containers, like they are not talking to each other. Can you try adding the network section to your docker-compose.yml like the one here https://github.com/NREL/OpenStudio-server/blob/develop/local_setup_scripts/docker-compose.yml

Whats your host OS?

pkovacs19 commented 5 years ago

drive space:

Filesystem                            Size  Used Avail Use% Mounted on
/dev/mapper/vg_root-lv_root           8.0G  125M  7.9G   2% /
devtmpfs                              1.9G     0  1.9G   0% /dev
tmpfs                                 1.9G  8.0K  1.9G   1% /dev/shm
tmpfs                                 1.9G  9.1M  1.9G   1% /run
tmpfs                                 1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/mapper/vg_root-lv_usr             16G  1.8G   15G  11% /usr
/dev/mapper/vg_root-lv_tmp            2.0G   33M  2.0G   2% /tmp
/dev/mapper/vg_root-lv_var             16G  7.8G  8.2G  49% /var
/dev/mapper/vg_root-lv_opt            8.0G   33M  8.0G   1% /opt
/dev/mapper/vg_root-lv_export         6.7G   33M  6.7G   1% /export
/dev/vda1                             509M   95M  415M  19% /boot
/dev/mapper/vg_root-lv_var_log        1.5G   43M  1.5G   3% /var/log
/dev/mapper/vg_root-lv_var_tmp        997M   33M  965M   4% /var/tmp
/dev/mapper/vg_root-lv_var_log_audit  497M   39M  459M   8% /var/log/audit
overlay                                16G  7.8G  8.2G  49% /var/lib/docker/overlay2/c78dc55b8f9989d781fd3f30c0a908b7e989e57f29d62258c73530c9a8211765/merged
overlay                                16G  7.8G  8.2G  49% /var/lib/docker/overlay2/334401191e6af69c0822329d68369ab4c9672df39a8156d2c3485febab66ad81/merged
overlay                                16G  7.8G  8.2G  49% /var/lib/docker/overlay2/a11a3a56de8c9d1a2ab73cc70b645e8e351c2aed3cc81569bb4f67734e28889e/merged
overlay                                16G  7.8G  8.2G  49% /var/lib/docker/overlay2/32b70aaa60c28978abd591f49c455a77bc50b07477ac4ace31ad7bfb1fb60210/merged
overlay                                16G  7.8G  8.2G  49% /var/lib/docker/overlay2/f45d09874f990cf9215e783eba509b766f689ff4a72e41b3791677bb84c0e9ed/merged
overlay                                16G  7.8G  8.2G  49% /var/lib/docker/overlay2/f88e3605d50de71a7fe7d1225056ed546cf24b1cee60d2163b38218b64ca70fa/merged
overlay                                16G  7.8G  8.2G  49% /var/lib/docker/overlay2/209f0ebb23e740036165afd563b8b55471aef9c743f68c82f48af291f63730f1/merged
shm                                    64M     0   64M   0% /var/lib/docker/containers/a03c1639449dff769489de49036c3bd7e69c5d52edeff191253d67d6d47a9132/mounts/shm
shm                                    64M     0   64M   0% /var/lib/docker/containers/27097678d4eb7d243f952a1e8cedb5f755c1d0cb5b4516ed6481143f584635aa/mounts/shm
shm                                    64M     0   64M   0% /var/lib/docker/containers/b3d91d90224d779db42420f47420c5b1a4934ed1977359256094de1fb3d789e6/mounts/shm
shm                                    64M     0   64M   0% /var/lib/docker/containers/4a06e9ffdd0c5de49030e0a68071d5fbe63e2f9dcb5308caa919872bc826124d/mounts/shm
shm                                    64M     0   64M   0% /var/lib/docker/containers/3000640b2cc8c5782f60ec5d0591ed8f59d9a8c14169bf47ab5f02f3fe1d2230/mounts/shm
shm                                    64M     0   64M   0% /var/lib/docker/containers/2daac8b7868402a056dbc207f5146d6f63f1c0264686fb1cc33fbc4b8ff34ac1/mounts/shm
shm                                    64M     0   64M   0% /var/lib/docker/containers/9ca4fd76c2a13748d4f2842c0fae84941587ddf8c5d923e97fe326fdf4e998db/mounts/shm

It does seem to be a networking thing to me as well. I've already got that networking section in my docker-compose.yml file. Here is my current docker-compose.yml file.

version: '3'
services:
  db:
    image: 127.0.0.1:5000/mongo
    ports:
      - "27017:27017"
    volumes:
      - dbdata:/data/db
    deploy:
      placement:
        constraints:
          - node.role == manager
      #resources:
      #  reservations:
      #    cpus: "${AWS_MONGO_CORES}"
  queue:
    image: 127.0.0.1:5000/redis
    ports:
      - "6379:6379"
    deploy:
      placement:
        constraints:
          - node.role == manager
      #resources:
      #  reservations:
      #    cpus: "1"
  web:
    image: 127.0.0.1:5000/openstudio-server
    ports:
      - "8080:80"
      - "80:80"
      - "443:443"
    environment:
      - OS_SERVER_NUMBER_OF_WORKERS=${AWS_OS_SERVER_NUMBER_OF_WORKERS}
      - MAX_REQUESTS=${AWS_MAX_REQUESTS}
      - MAX_POOL=${AWS_MAX_POOL}
    volumes:
      - osdata:/mnt/openstudio
    depends_on:
      - db
      - queue
    deploy:
      placement:
        constraints:
          - node.role == manager
      #resources:
      #  reservations:
      #    cpus: "${AWS_WEB_CORES}"
  web-background:
    image: 127.0.0.1:5000/openstudio-server
    environment:
      - OS_SERVER_NUMBER_OF_WORKERS=${AWS_OS_SERVER_NUMBER_OF_WORKERS}
      - QUEUES=background,analyses
    volumes:
      - osdata:/mnt/openstudio
    command: bundle exec rake environment resque:work
    depends_on:
      - db
      - web
      - queue
    deploy:
      placement:
        constraints:
          - node.role == manager
      #resources:
      #  reservations:
      #    cpus: "1"
  worker:
    image: 127.0.0.1:5000/openstudio-server
    environment:
      - QUEUES=simulations
      - COUNT=1
    command: bundle exec rake environment resque:work
    volumes:
      - /mnt/openstudio
    depends_on:
      - web
      - web-background
      - db
      - queue
      - rserve
    #deploy:
    #  resources:
    #    reservations:
    #      cpus: "1"
  rserve:
    image: 127.0.0.1:5000/openstudio-rserve
    volumes:
      - osdata:/mnt/openstudio
    depends_on:
      - web
      - web-background
      - db
    deploy:
      placement:
        constraints:
          - node.role == manager
      #resources:
      #  reservations:
      #    cpus: "1"
volumes:
  osdata:
    external: true
  dbdata:
    external: true
networks:
  default:
    driver: overlay
    ipam:
      driver: default
      config:
        - subnet: 172.28.0.0/16

Could it be the host itself needs an update to the firewall rules?

The host OS is CentOS 7.6.1810.

brianlball commented 5 years ago

well, thats enough space to at least start an analysis.

This is really similar to what happened when i tried to use VMWare on a Windows box to run the server. I couldnt get it to work and gave up after spending too much time on it. My hunch was it was a networking issue between the containers but i couldnt prove it.

I dont have any appreciable experience with CentOS.

Can you verify that you have the ENV VARs that begin with "AWS_" defined?

Can you also change you docker-compose.yml version to '3.4' Change the 3 "command:" lines to whats in https://github.com/NREL/OpenStudio-server/blob/develop/local_setup_scripts/docker-compose.yml#L50 https://github.com/NREL/OpenStudio-server/blob/develop/local_setup_scripts/docker-compose.yml#L69 https://github.com/NREL/OpenStudio-server/blob/develop/local_setup_scripts/docker-compose.yml#L87

pkovacs19 commented 5 years ago

I double checked that the "AWS_" variables were all set. They are based on the recommendations here: https://github.com/NREL/OpenStudio-server/wiki/Deployment-Using-Docker-Swarm

The web section in my docker-compose.yml file didn't have a command section. But I've added it and changed the others as you suggested and updated the version to '3.4' There doesn't appear to be any change in behaviour.

I've tried opening up all the necessary ports listed under each service, but that didn't seem to change anything either.

brianlball commented 5 years ago

can you try:

docker exec -it xxx ping osserver_web docker exec -it xxx ping osserver_web-background docker exec -it xxx ping osserver_worker

where xxx is the container ID of the Rserve container from 'docker ps'

pkovacs19 commented 5 years ago

It appears as though Rserve is communicating with the other containers find. When I do the above all transmitted packets are recieved.

brianlball commented 5 years ago

also, what command are you using to startup the containers? whats the output of 'docker network ls' and 'docker network inspect osserver_default'

pkovacs19 commented 5 years ago

To start the dockers, I am running docker stack deploy osserver --compose-file=/path/to/compose/yml

output from 'docker network ls':

NETWORK ID          NAME                DRIVER              SCOPE
f057aae0889a        bridge              bridge              local
ffc85af19c71        docker_gwbridge     bridge              local
38b872438e17        host                host                local
9cn72k2sgnsy        ingress             overlay             swarm
07750cee187e        none                null                local
9f8wt677b8xt        osserver_default    overlay             swarm

and from 'docker network inspect osserver_default':

[
    {
        "Name": "osserver_default",
        "Id": "9f8wt677b8xtf7zuw68jo6vzy",
        "Created": "2019-06-17T13:40:40.877671415-07:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.28.0.0/16",
                    "Gateway": "172.28.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "188d35353e8c47bc9b1fed8be8dfffb878990d64a15f98a35c1495894c301deb": {
                "Name": "osserver_web.1.ymi1v0fryf7e1j4icw5p1cz12",
                "EndpointID": "41f41d98502a16ec81494481013f3320d51202af7ccec9e40afb24b16d74f0fb",
                "MacAddress": "02:42:ac:1c:00:07",
                "IPv4Address": "172.28.0.7/16",
                "IPv6Address": ""
            },
            "23eb721b2f30d51a74535ed782bf28a638279e1f092da6f4b170575be652f516": {
                "Name": "osserver_db.1.xyjqbhodxcatfcajkcf230czg",
                "EndpointID": "404dc71099e6b57e5f1d2a1b1d1f9ad6e616c5befd23a5366a0ceb2c583f3073",
                "MacAddress": "02:42:ac:1c:00:03",
                "IPv4Address": "172.28.0.3/16",
                "IPv6Address": ""
            },
            "5eb242647e98806494ae07e25e5bcd07ce874ddd08bdb2124657b2e127d24634": {
                "Name": "osserver_worker.1.j5zmxxrhjvjw9864yl4xxm9od",
                "EndpointID": "f1f844ad690f6a7beed3abb325c8164aa1c1d089c94fe42213c70098276a0e8c",
                "MacAddress": "02:42:ac:1c:00:0b",
                "IPv4Address": "172.28.0.11/16",
                "IPv6Address": ""
            },
            "8688c42f3412436d92e7de140bfcfe955415f79a126caea28e233c5206de3463": {
                "Name": "osserver_web-background.1.lwa98h4c7pfb5glweu6t7mz2l",
                "EndpointID": "3bcee6eede4674451cd9affa2839992b148761868dc1895e375323824a27f608",
                "MacAddress": "02:42:ac:1c:00:09",
                "IPv4Address": "172.28.0.9/16",
                "IPv6Address": ""
            },
            "94fe791e50dcb49bc539dfaadf39385fb3f5aa6062ecef73dcf4cc40888bfc75": {
                "Name": "osserver_rserve.1.5cfjzd117l9hkd8vm3pdncooh",
                "EndpointID": "708567462b20fd001dfa22492b895584acc5ce17422172494e7838e083bc9e44",
                "MacAddress": "02:42:ac:1c:00:0d",
                "IPv4Address": "172.28.0.13/16",
                "IPv6Address": ""
            },
            "be30ea186c5293f9eb2f6d53ee5b834d420c2a381e6547055301707407b547c7": {
                "Name": "osserver_queue.1.t423qxz2ihjwdhupia05piceo",
                "EndpointID": "b56fe17b4fe4414ba314730ebcc0effec137a761875576d876928376f410660f",
                "MacAddress": "02:42:ac:1c:00:05",
                "IPv4Address": "172.28.0.5/16",
                "IPv6Address": ""
            },
            "lb-osserver_default": {
                "Name": "osserver_default-endpoint",
                "EndpointID": "f77cbeb77d7c4bb7a52ae72c80f72956bce99681ba70eb3bd56925fd2938dc70",
                "MacAddress": "02:42:ac:1c:00:0e",
                "IPv4Address": "172.28.0.14/16",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {
            "com.docker.stack.namespace": "osserver"
        },
        "Peers": [
            {
                "Name": "5919cfda8cf1",
                "IP": "142.104.75.70"
            }
        ]
    }
]
brianlball commented 5 years ago

all that looks okay to me... lets try from the other end. Can you try and submit one of the PAT example projects from the PAT repo? https://github.com/NREL/OpenStudio-PAT/tree/develop/sample_projects/SEB_LHS_2013

pkovacs19 commented 5 years ago

That job also gets stuck in the queue, like the others.

image

brianlball commented 5 years ago

@anyaelena @nllong @tijcolem you guys have any ideas on this?