MIT-TESSE / goseek-challenge

Instructions for competing in the GOSEEK challenge at ICRA 2020
67 stars 16 forks source link

Cannot receive data from the simulator. The connection is blocked or the simulator is not running. #15

Open joeljosephjin opened 4 years ago

joeljosephjin commented 4 years ago

xvfb-run python eval.py --agent-config baselines/config/random-agent.yaml --episode-config config/check-ground-truth.yaml

gives this

Set current directory to /home/joel/goseek-challenge
Found path: /home/joel/goseek-challenge/simulator/goseek-v0.1.4.x86_64
Mono path[0] = '/home/joel/goseek-challenge/simulator/goseek-v0.1.4_Data/Managed'
Mono config path = '/home/joel/goseek-challenge/simulator/goseek-v0.1.4_Data/MonoBleedingEdge/etc'
Preloaded 'ScreenSelector.so'
Display 0 'screen': 640x480 (primary device).
Logging to /home/joel/.config/unity3d/Editor/Player.log
Evaluation episode on episode 0, scene 3
Traceback (most recent call last):
  File "eval.py", line 85, in <module>
    results = main(episode_cfg, agent_args)
  File "eval.py", line 66, in main
    return benchmark.evaluate(agent)
  File "/home/joel/tesse-gym/src/tesse_gym/tasks/goseek/goseek_benchmark.py", line 97, in evaluate
    scene_id=self.scenes[episode], random_seed=self.random_seeds[episode]
  File "/home/joel/tesse-gym/src/tesse_gym/tasks/goseek/goseek.py", line 138, in reset
    super().reset(scene_id, random_seed)
  File "/home/joel/tesse-gym/src/tesse_gym/core/tesse_gym.py", line 247, in reset
    observation = self.get_synced_observation()
  File "/home/joel/tesse-gym/src/tesse_gym/core/tesse_gym.py", line 279, in get_synced_observation
    response = self.observe()
  File "/home/joel/tesse-gym/src/tesse_gym/tasks/goseek/goseek_full_perception.py", line 95, in observe
    return self._data_request(DataRequest(metadata=True, cameras=cameras))
  File "/home/joel/tesse-gym/src/tesse_gym/core/tesse_gym.py", line 383, in _data_request
    raise TesseConnectionError()
tesse_gym.core.utils.TesseConnectionError: Cannot receive data from the simulator. The connection is blocked or the simulator is not running. 

i am running this on my google cloud instance through chrome remote desktop connection. It has an nvidia GPU.

ZacRavichandran commented 4 years ago

It seems like the client can't connect to the simulator for some reason. This usually happens when the DISPLAY variable isn't set, but the output Display 0 'screen': 640x480 (primary device) indicates otherwise.

Could you check the network connection by running the below commands. This uses the low level interface, tesse-interface, to query the simulator. The following output is expected.

>>> from tesse.msgs import DataRequest; from tesse.env import Env
>>> print(Env().request(DataRequest()).metadata)
<TESSE_Agent_Metadata_v0.5>
  <position x='-6.649457' y='0.4999968' z='-5.790709'/>
  <quaternion x='0' y='0.9426415' z='0' w='-0.3338069'/>
  <velocity x_dot='0' y_dot='2.233302E-06' z_dot='0'/>
  <angular_velocity x_ang_dot='0' y_ang_dot='0' z_ang_dot='0'/>
  <acceleration x_ddot='0' y_ddot='0' z_ddot='0'/>
  <angular_acceleration x_ang_ddot='0' y_ang_ddot='0' z_ang_ddot='0'/>
  <time>295.8927</time>
  <collision status='false' name=''/>
  <collider status='true'/>
</TESSE_Agent_Metadata_v0.5>

Thanks!

joeljosephjin commented 4 years ago

Error:

>>> from tesse.msgs import DataRequest; from tesse.env import Env
>>> print(Env().request(DataRequest()).metadata)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'metadata'
joeljosephjin commented 4 years ago

Here is the content of the /home/.config/unity3d/Editor/PLayer.log:

Desktop is 640 x 480 @ 0 Hz
Unable to find a supported OpenGL core profile
Failed to create valid graphics context: please ensure you meet the minimum requirements
E.g. OpenGL core profile 3.2 or later for OpenGL Core renderer
[Vulkan init] extensions: count=15
[Vulkan init] extensions: name=VK_KHR_device_group_creation, enabled=0
[Vulkan init] extensions: name=VK_KHR_display, enabled=1
[Vulkan init] extensions: name=VK_KHR_external_fence_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_memory_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_semaphore_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_get_physical_device_properties2, enabled=0
[Vulkan init] extensions: name=VK_KHR_get_surface_capabilities2, enabled=0
[Vulkan init] extensions: name=VK_KHR_surface, enabled=1
[Vulkan init] extensions: name=VK_KHR_xcb_surface, enabled=0
[Vulkan init] extensions: name=VK_KHR_xlib_surface, enabled=1
[Vulkan init] extensions: name=VK_EXT_acquire_xlib_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_utils, enabled=0
[Vulkan init] extensions: name=VK_EXT_direct_mode_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_display_surface_counter, enabled=0
Vulkan detection: 0
No supported renderers found, exiting

(Filename: ./PlatformDependent/LinuxStandalone/main.cpp Line: 639)

Here is the output of: glxinfo | grep "version"

server glx version string: 1.4
client glx version string: 1.4
GLX version: 1.4
    Max core profile version: 3.3
    Max compat profile version: 3.1
    Max GLES1 profile version: 1.1
    Max GLES[23] profile version: 3.0
OpenGL core profile version string: 3.3 (Core Profile) Mesa 19.2.8
OpenGL core profile shading language version string: 3.30
OpenGL version string: 3.1 Mesa 19.2.8
OpenGL shading language version string: 1.40
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 19.2.8
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00

And here is the output of: nvidia-smi

Fri May  1 07:22:36 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

It seems unity3d player requires OpenGL "core version" as 3.2+.... which as you can see my machine fits that requirement. Also I don't understand how this could be due to outdated OpenGL since its the latest version nvidia driver on a newly installed google cloud instance with a tesla t4 gpu.

ZacRavichandran commented 4 years ago

Thanks for providing those diagnostics, that's really helpful.

You're right that OpenGL shouldn't be causing the issue.

I noticed that the output of nvidia-smi does not include a required X server process. If running, it would look something like this in the output of nvidia-smi:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1687      G   /usr/lib/xorg/Xorg                           618MiB |
+-----------------------------------------------------------------------------+

We have a writeup on how to setup a headless machine, which includes running an X server, here. Could you see if any of those instructions help?

joeljosephjin commented 4 years ago

The last command there sudo X :0 & gave this error:

X.Org X Server 1.19.6
Release Date: 2017-12-20
X Protocol Version 11, Revision 0
Build Operating System: Linux 4.4.0-168-generic x86_64 Ubuntu
Current Operating System: Linux ubuntu-bionic-2 5.3.0-1018-gcp #19~18.04.1-Ubuntu SMP Tue Apr 14 12:49:45 UTC 2020 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-1018-gcp root=UUID=0e3040b1-b682-430d-8f18-b7db7004a9e3 ro scsi_mod.use_blk_mq=Y console=ttyS0
Build Date: 14 November 2019  06:20:00PM
xorg-server 2:1.19.6-1ubuntu4.4 (For technical support please see http://www.ubuntu.com/support) 
Current version of pixman: 0.34.0
    Before reporting problems, check http://wiki.x.org
    to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
    (++) from command line, (!!) notice, (II) informational,
    (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Fri May  1 21:24:53 2020
(==) Using config file: "/etc/X11/xorg.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(EE) 
Fatal server error:
(EE) no screens found(EE) 
(EE) 
Please consult the The X.Org Foundation support 
     at http://wiki.x.org
 for help. 
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE) 
(EE) Server terminated with error (1). Closing log file.

I tried running the eval.py command again but the same error comes up. Thank you for helping btw :)

joeljosephjin commented 4 years ago

Contents of Xorg.0.log file:

[    69.328] 
X.Org X Server 1.19.6
Release Date: 2017-12-20
[    69.328] X Protocol Version 11, Revision 0
[    69.328] Build Operating System: Linux 4.4.0-168-generic x86_64 Ubuntu
[    69.328] Current Operating System: Linux ubuntu-bionic-2 5.3.0-1018-gcp #19~18.04.1-Ubuntu SMP Tue Apr 14 12:49:45 UTC 2020 x86_64
[    69.328] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.3.0-1018-gcp root=UUID=0e3040b1-b682-430d-8f18-b7db7004a9e3 ro scsi_mod.use_blk_mq=Y console=ttyS0
[    69.328] Build Date: 14 November 2019  06:20:00PM
[    69.329] xorg-server 2:1.19.6-1ubuntu4.4 (For technical support please see http://www.ubuntu.com/support) 
[    69.329] Current version of pixman: 0.34.0
[    69.329]    Before reporting problems, check http://wiki.x.org
    to make sure that you have the latest version.
[    69.329] Markers: (--) probed, (**) from config file, (==) default setting,
    (++) from command line, (!!) notice, (II) informational,
    (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
[    69.329] (==) Log file: "/var/log/Xorg.0.log", Time: Fri May  1 21:24:53 2020
[    69.330] (==) Using config file: "/etc/X11/xorg.conf"
[    69.330] (==) Using system config directory "/usr/share/X11/xorg.conf.d"
[    69.380] (==) ServerLayout "Layout0"
[    69.380] (**) |-->Screen "Screen0" (0)
[    69.380] (**) |   |-->Monitor "Monitor0"
[    69.380] (**) |   |-->Device "Device0"
[    69.380] (**) |-->Input Device "Keyboard0"
[    69.380] (**) |-->Input Device "Mouse0"
[    69.380] (==) Automatically adding devices
[    69.380] (==) Automatically enabling devices
[    69.380] (==) Automatically adding GPU devices
[    69.380] (==) Automatically binding GPU devices
[    69.381] (==) Max clients allowed: 256, resource mask: 0x1fffff
[    69.381] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
[    69.381]    Entry deleted from font path.
[    69.381] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist.
[    69.381]    Entry deleted from font path.
[    69.381] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist.
[    69.381]    Entry deleted from font path.
[    69.381] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist.
[    69.381]    Entry deleted from font path.
[    69.381] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist.
[    69.381]    Entry deleted from font path.
[    69.381] (==) FontPath set to:
    /usr/share/fonts/X11/misc,
    /usr/share/fonts/X11/Type1,
    built-ins
[    69.381] (==) ModulePath set to "/usr/lib/xorg/modules"
[    69.381] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled.
[    69.381] (WW) Disabling Keyboard0
[    69.381] (WW) Disabling Mouse0
[    69.382] (II) Loader magic: 0x556e3830c020
[    69.382] (II) Module ABI versions:
[    69.382]    X.Org ANSI C Emulation: 0.4
[    69.382]    X.Org Video Driver: 23.0
[    69.382]    X.Org XInput driver : 24.1
[    69.382]    X.Org Server Extension : 10.0
[    69.383] (--) using VT number 2

[    69.383] (II) systemd-logind: logind integration requires -keeptty and -keeptty was not provided, disabling logind integration
[    69.383] (II) xfree86: Adding drm device (/dev/dri/card0)
[    69.383] (EE) /dev/dri/card0: failed to set DRM interface version 1.4: Permission denied
[    69.385] (--) PCI: (0:0:4:0) 10de:1eb8:10de:12a2 rev 161, Mem @ 0xc0000000/16777216, 0x10000000000/268435456, 0x10010000000/33554432
[    69.385] (II) no primary bus or device found
[    69.385] (II) LoadModule: "glx"
[    69.406] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so
[    69.443] (II) Module glx: vendor="X.Org Foundation"
[    69.443]    compiled for 1.19.6, module version = 1.0.0
[    69.443]    ABI class: X.Org Server Extension, version 10.0
[    69.443] (II) LoadModule: "nvidia"
[    69.443] (WW) Warning, couldn't open module nvidia
[    69.443] (II) UnloadModule: "nvidia"
[    69.443] (II) Unloading nvidia
[    69.443] (EE) Failed to load module "nvidia" (module does not exist, 0)
[    69.443] (==) Matched modesetting as autoconfigured driver 0
[    69.443] (==) Matched fbdev as autoconfigured driver 1
[    69.443] (==) Matched vesa as autoconfigured driver 2
[    69.443] (==) Assigned the driver to the xf86ConfigLayout
[    69.443] (II) LoadModule: "modesetting"
[    69.443] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so
[    69.462] (II) Module modesetting: vendor="X.Org Foundation"
[    69.462]    compiled for 1.19.6, module version = 1.19.6
[    69.462]    Module class: X.Org Video Driver
[    69.462]    ABI class: X.Org Video Driver, version 23.0
[    69.462] (II) LoadModule: "fbdev"
[    69.462] (II) Loading /usr/lib/xorg/modules/drivers/fbdev_drv.so
[    69.486] (II) Module fbdev: vendor="X.Org Foundation"
[    69.486]    compiled for 1.19.3, module version = 0.4.4
[    69.486]    Module class: X.Org Video Driver
[    69.486]    ABI class: X.Org Video Driver, version 23.0
[    69.486] (II) LoadModule: "vesa"
[    69.487] (II) Loading /usr/lib/xorg/modules/drivers/vesa_drv.so
[    69.514] (II) Module vesa: vendor="X.Org Foundation"
[    69.514]    compiled for 1.19.3, module version = 2.3.4
[    69.514]    Module class: X.Org Video Driver
[    69.514]    ABI class: X.Org Video Driver, version 23.0
[    69.514] (II) modesetting: Driver for Modesetting Kernel Drivers: kms
[    69.514] (II) FBDEV: driver for framebuffer: fbdev
[    69.514] (II) VESA: driver for VESA chipsets: vesa
[    69.514] (WW) Falling back to old probe method for modesetting
[    69.515] (WW) Falling back to old probe method for fbdev
[    69.515] (II) Loading sub module "fbdevhw"
[    69.515] (II) LoadModule: "fbdevhw"
[    69.515] (II) Loading /usr/lib/xorg/modules/libfbdevhw.so
[    69.516] (II) Module fbdevhw: vendor="X.Org Foundation"
[    69.516]    compiled for 1.19.6, module version = 0.0.2
[    69.516]    ABI class: X.Org Video Driver, version 23.0
[    69.516] (EE) open /dev/fb0: No such file or directory
[    69.516] (WW) Falling back to old probe method for vesa
[    69.516] (EE) Screen 0 deleted because of no matching config section.
[    69.516] (II) UnloadModule: "modesetting"
[    69.516] (EE) Device(s) detected, but none match those in the config file.
[    69.517] (EE) 
Fatal server error:
[    69.517] (EE) no screens found(EE) 
[    69.517] (EE) 
Please consult the The X.Org Foundation support 
     at http://wiki.x.org
 for help. 
[    69.517] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[    69.517] (EE) 
[    69.517] (EE) Server terminated with error (1). Closing log file.

Contents of /etc/X11/xorg.conf file: https://pastebin.com/wFtGvvQv Contents of /usr/share/X11/xorg.conf.d/10-nvidia.conf file:

Section "OutputClass"
    Identifier "nvidia"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    Option "AllowEmptyInitialConfiguration"
    ModulePath "/usr/lib/x86_64-linux-gnu/nvidia-440/xorg"
EndSection
lexavtanke commented 4 years ago

Hello,

I am struggling with pretty the same issue. I am also trying to use goseek on google servers. Here is my /var/log/Xorg.0.log It is a little bit different.

[ 952.445] X.Org X Server 1.19.6 Release Date: 2017-12-20 [ 952.445] X Protocol Version 11, Revision 0 [ 952.445] Build Operating System: Linux 4.4.0-168-generic x86_64 Ubuntu [ 952.445] Current Operating System: Linux cbf3aa56d9a9 4.19.104+ #1 SMP Wed Feb 19 05:26:34 PST 2020 x86_64 [ 952.445] Kernel command line: BOOT_IMAGE=/syslinux/vmlinuz.A init=/usr/lib/systemd/systemd boot=local rootwait ro noresume noswap loglevel=7 noinitrd console=ttyS0 security=apparmor virtio_net.napi_tx=1 systemd.unified_cgroup_hierarchy=false systemd.legacy_systemd_cgroup_controller=false csm.disabled=1 dm_verity.error_behavior=3 dm_verity.max_bios=-1 dm_verity.dev_wait=1 i915.modeset=1 cros_efi loadpin.enabled=0 module.sig_enforce=0 root=/dev/dm-0 "dm=1 vroot none ro 1,0 2539520 verity payload=PARTUUID=76B3E38C-A464-A94B-9AB0-DF60201C4CD1 hashtree=PARTUUID=76B3E38C-A464-A94B-9AB0-DF60201C4CD1 hashstart=2539520 alg=sha256 root_hexdigest=434f37a2ee1ed91037365abcdd2fbbdd8bd44393af171292826221bb605c406b salt=0bd8a061534d73349529369f10447da9f13414a20bdbf0686146f776d5867fa8" mitigations=off [ 952.445] Build Date: 14 November 2019 06:20:00PM [ 952.445] xorg-server 2:1.19.6-1ubuntu4.4 (For technical support please see http://www.ubuntu.com/support) [ 952.445] Current version of pixman: 0.34.0 [ 952.445] Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. [ 952.445] Markers: (--) probed, () from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 952.445] (==) Log file: "/var/log/Xorg.0.log", Time: Sat May 2 21:41:37 2020 [ 952.445] (==) Using config file: "/etc/X11/xorg.conf" [ 952.445] (==) Using system config directory "/usr/share/X11/xorg.conf.d" [ 952.446] (==) ServerLayout "Layout0" [ 952.446] () |-->Screen "Screen0" (0) [ 952.446] () | |-->Monitor "Monitor0" [ 952.446] () | |-->Device "Device0" [ 952.446] () |-->Input Device "Keyboard0" [ 952.446] () |-->Input Device "Mouse0" [ 952.446] (==) Automatically adding devices [ 952.446] (==) Automatically enabling devices [ 952.446] (==) Automatically adding GPU devices [ 952.446] (==) Automatically binding GPU devices [ 952.446] (==) Max clients allowed: 256, resource mask: 0x1fffff [ 952.446] (WW) The directory "/usr/share/fonts/X11/misc" does not exist. [ 952.446] Entry deleted from font path. [ 952.446] (WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist. [ 952.446] Entry deleted from font path. [ 952.446] (WW) The directory "/usr/share/fonts/X11/100dpi/" does not exist. [ 952.446] Entry deleted from font path. [ 952.446] (WW) The directory "/usr/share/fonts/X11/75dpi/" does not exist. [ 952.446] Entry deleted from font path. [ 952.446] (WW) The directory "/usr/share/fonts/X11/Type1" does not exist. [ 952.446] Entry deleted from font path. [ 952.446] (WW) The directory "/usr/share/fonts/X11/100dpi" does not exist. [ 952.446] Entry deleted from font path. [ 952.446] (WW) The directory "/usr/share/fonts/X11/75dpi" does not exist. [ 952.446] Entry deleted from font path. [ 952.446] (==) FontPath set to: built-ins [ 952.446] (==) ModulePath set to "/usr/lib/xorg/modules" [ 952.446] (WW) Hotplugging is on, devices using drivers 'kbd', 'mouse' or 'vmmouse' will be disabled. [ 952.446] (WW) Disabling Keyboard0 [ 952.446] (WW) Disabling Mouse0 [ 952.446] (II) Loader magic: 0x556c8bb1e020 [ 952.446] (II) Module ABI versions: [ 952.446] X.Org ANSI C Emulation: 0.4 [ 952.446] X.Org Video Driver: 23.0 [ 952.446] X.Org XInput driver : 24.1 [ 952.446] X.Org Server Extension : 10.0 [ 952.446] (EE) dbus-core: error connecting to system bus: org.freedesktop.DBus.Error.FileNotFound (Failed to connect to socket /var/run/dbus/system_bus_socket: No such file or directory) [ 952.449] (--) PCI: (0:0:4:0) 10de:15f8:10de:118f rev 161, Mem @ 0xc0000000/16777216, 0x10000000000/17179869184, 0x10400000000/33554432, I/O @ 0x0000c000/128 [ 952.449] (II) no primary bus or device found [ 952.449] (II) LoadModule: "glx" [ 952.449] (II) Loading /usr/lib/xorg/modules/extensions/libglx.so [ 952.450] (II) Module glx: vendor="X.Org Foundation" [ 952.450] compiled for 1.19.6, module version = 1.0.0 [ 952.450] ABI class: X.Org Server Extension, version 10.0 [ 952.450] (II) LoadModule: "nvidia" [ 952.450] (WW) Warning, couldn't open module nvidia [ 952.450] (II) UnloadModule: "nvidia" [ 952.450] (II) Unloading nvidia [ 952.450] (EE) Failed to load module "nvidia" (module does not exist, 0) [ 952.450] (==) Matched modesetting as autoconfigured driver 0 [ 952.450] (==) Matched fbdev as autoconfigured driver 1 [ 952.450] (==) Matched vesa as autoconfigured driver 2 [ 952.450] (==) Assigned the driver to the xf86ConfigLayout [ 952.450] (II) LoadModule: "modesetting" [ 952.450] (II) Loading /usr/lib/xorg/modules/drivers/modesetting_drv.so [ 952.450] (II) Module modesetting: vendor="X.Org Foundation" [ 952.450] compiled for 1.19.6, module version = 1.19.6 [ 952.450] Module class: X.Org Video Driver [ 952.450] ABI class: X.Org Video Driver, version 23.0 [ 952.450] (II) LoadModule: "fbdev" [ 952.450] (WW) Warning, couldn't open module fbdev [ 952.450] (II) UnloadModule: "fbdev" [ 952.450] (II) Unloading fbdev [ 952.450] (EE) Failed to load module "fbdev" (module does not exist, 0) [ 952.450] (II) LoadModule: "vesa" [ 952.450] (WW) Warning, couldn't open module vesa [ 952.450] (II) UnloadModule: "vesa" [ 952.450] (II) Unloading vesa [ 952.450] (EE) Failed to load module "vesa" (module does not exist, 0) [ 952.450] (II) modesetting: Driver for Modesetting Kernel Drivers: kms [ 952.450] (EE) Fatal server error: [ 952.450] (EE) parse_vt_settings: Cannot open /dev/tty0 (No such file or directory) [ 952.450] (EE) [ 952.451] (EE) Please consult the The X.Org Foundation support at http://wiki.x.org for help. [ 952.451] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information. [ 952.451] (EE) [ 952.451] (WW) xf86CloseConsole: KDSETMODE failed: Bad file descriptor [ 952.451] (WW) xf86CloseConsole: VT_GETMODE failed: Bad file descriptor [ 952.451] (EE) Server terminated with error (1). Closing log file.

ZacRavichandran commented 4 years ago

@joeljosephjin np, glad to help 😃 . Hopefully we're close to getting this working! I have two questions:

  1. Could you paste the contents of your /etc/X11/xorg.conf here, or perhaps in a gist? I can't reach to link you sent due to VPN issues.

  2. Were you able to complete up to step 3 on the linked instructions? (here for reference).

@lexavtanke, were you able to get through the linked instructions for setting up a headless server? If so, could you also provide the contents of your /etc/X11/xorg.conf? Thanks!

lexavtanke commented 4 years ago

@ZacRavichandran Thank you for your fast replay. Yes, tried to get though your instruction, but it doesn't work for me, and now I think know why.

Here is the solution, but not straightforward, they use openGL. https://github.com/demotomohiro/remocolab

but the most interesting thing:

  # Without "-seat seat-1" option, Xorg try to open /dev/tty0 but it doesn't exists.
  # You can create /dev/tty0 with "mknod /dev/tty0 c 4 0" but you will get permision denied error.
  subprocess.Popen(["Xorg", "-seat", "seat-1", "-allowMouseOpenFail", "-novtswitch", "-nolisten", "tcp"])

This option doesn't set in your instruction. Tomorrow I will try your instruction again but with this option, hope It will work too.

Here is the working /etc/X11/xorg.conf:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 418.67
Section "DRI"
    Mode 0666
EndSection

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/mouse"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "Tesla P100-PCIE-16GB"
    BusID          "PCI:0:4:0"
    MatchSeat      "seat-1"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Virtual     1920 1200
        Depth       24
    EndSubSection
EndSection
ZacRavichandran commented 4 years ago

@lexavtanke That's very helpful, thanks for sharing the link! We tested those instructions against some internal servers and AWS, so it's possible that there are additional steps required on Google Cloud or Colab. Please let me know what you find!

lexavtanke commented 4 years ago

@ZacRavichandran Sad but new config with your instruction doesn't work may be It's because of the versions of driver for video card and kernel. I think so because of this part of code in remocolab

def _setup_nvidia_gl():
  # Install TESLA DRIVER FOR LINUX X64.
  # Kernel module in this driver is already loaded and cannot be neither removed nor updated.
  # (nvidia, nvidia_uvm, nvidia_drm. See dmesg)
  # Version number of nvidia driver for Xorg must match version number of these kernel module.
  # But existing nvidia driver for Xorg might not match.
  # So overwrite them with the nvidia driver that is same version to loaded kernel module.
  ret = subprocess.run(
                  ["nvidia-smi", "--query-gpu=driver_version", "--format=csv,noheader"],
                  stdout = subprocess.PIPE,
                  check = True,
                  universal_newlines = True)
ZHMA1996 commented 4 years ago

@ZacRavichandran faced with same issue. Differently, a X server process was running on my gpu.

python eval.py --agent-config baselines/config/random-agent.yaml --episode-config config/check-ground-truth.yaml give this

ZacRavichandran commented 4 years ago

@lexavtanke ah yes, it looks like there's a bit of configuration required to get a proper virtual display running on colab. I'll see if I can track down a solution. In the meantime, please let me know if you find any useful resources!

ZacRavichandran commented 4 years ago

@ZHMA1996 I noticed that there is no Display value in your output. Normally, we would expect to see something like

...
Preloaded 'ScreenSelector.so'
Display 0 '0': 3840x2160 (primary device).
...

Sometimes this is because the DISPLAY environment variable has not been set. In the same terminal with which you run eval.py, could you try running

export DISPLAY=:0

Thanks!

ZacRavichandran commented 4 years ago

Should that be export DISPLAY=:3 instead of export DISPLAY:=3 :) ?

And to confirm, are you running this remotely?

ZHMA1996 commented 4 years ago

Thanks for your reply, i had tried the command that you presented. However, it doesn't work. Here is the output of the 'nvidia-smi' Screenshot from 2020-05-07 21-40-34 Screenshot from 2020-05-07 21-40-34

Either run the command export DISPLAY=:0 or export DISPLAY:=3 didn't help.

Here is the output after running eval.py

Screenshot from 2020-05-07 21-44-39

ZHMA1996 commented 4 years ago

Yes, i am running this remotely

ZacRavichandran commented 4 years ago

It's odd that the simulator is still not finding the display. Could you try two things to dig into this further?

1) double check the value of DISPLAY

>>> echo $DISPLAY

2) Launch the simulator directly and observe the output

>>> cd ~/goseek-challenge/simulator
>>> ./goseek-v0.1.4.x86_64
ZHMA1996 commented 4 years ago

I tried what you presented, and here is the output

Screenshot from 2020-05-07 22-39-08

It seems nothing happened.

ZacRavichandran commented 4 years ago

Ok thanks, that's helpful.

To confirm the issue is with the display and not Unity, could you try to test via glxgears? This should look like the following

>>> glxgears
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
301 frames in 5.0 seconds = 60.150 FPS
300 frames in 5.0 seconds = 59.995 FPS
ZHMA1996 commented 4 years ago

@ZacRavichandran

The issue should be concerned with the display.

Here is the output after testing via glxgears

Error: couldn't open display (null)

joeljosephjin commented 4 years ago

so far, GCP,AWS, RemoColab and Oracle Cloud instances show this connection error, no matter what. But Genesis Cloud Instance works.

ZacRavichandran commented 4 years ago

Which AWS instance type are you using? We're using G4 instances with Nvidia driver version 440.64 and CUDA 10.2.

On our side, everything works as expected after setting up a virtual display. Could we walk through the steps you used to configure your AWS instance? Hopefully that'll solve the issues you're seeing.