askforalfred / alfred

ALFRED - A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
MIT License
352 stars 77 forks source link

Start X server on headless VM #112

Closed theorist17 closed 2 years ago

theorist17 commented 2 years ago

I've been having issue running startx.py & ALFRED.

Is ALFRED runnable in a Kubernetes container with Network File System (NFS)? Also, can I run AI2Thor on NFS based computer? (not safe for multi-thread)

This is the my result of running startx.py

$ python alfred_utils/scripts/startx.py
Starting X on DISPLAY=:2
Xorg -noreset +extension GLX +extension RANDR +extension RENDER -config pathfile :2

X.Org X Server 1.20.13
X Protocol Version 11, Revision 0
Build Operating System: linux Ubuntu
Current Operating System: Linux cheetah-7468656f726973743137-admfol-788bdf9f44-4pqkh 5.4.0-97-generic #110-Ubuntu SMP Thu Jan 13 18:22:13 UTC 2022 x86_64
Kernel command line: BOOT_IMAGE=/vmlinuz-5.4.0-97-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro maybe-ubiquity
Build Date: 14 December 2021  02:14:13PM
xorg-server 2:1.20.13-1ubuntu1~20.04.2 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.38.4
    Before reporting problems, check http://wiki.x.org
    to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
    (++) from command line, (!!) notice, (II) informational,
    (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Mon Apr 18 10:16:04 2022
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(EE)
Fatal server error:
(EE) parse_vt_settings: Cannot find a free VT: Inappropriate ioctl for device
(EE)
(EE)
Please consult the The X.Org Foundation support
     at http://wiki.x.org
 for help.
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE)
(EE) Server terminated with error (1). Closing log file.

This is my Xserver configuration file.

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:27:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:28:0:0"
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection

Section "Device"
    Identifier     "Device2"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:29:0:0"
EndSection

Section "Screen"
    Identifier     "Screen2"
    Device         "Device2"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection

Section "Device"
    Identifier     "Device3"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:30:0:0"
EndSection

Section "Screen"
    Identifier     "Screen3"
    Device         "Device3"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection

Section "Device"
    Identifier     "Device4"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:137:0:0"
EndSection

Section "Screen"
    Identifier     "Screen4"
    Device         "Device4"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection

Section "Device"
    Identifier     "Device5"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:138:0:0"
EndSection

Section "Screen"
    Identifier     "Screen5"
    Device         "Device5"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection

Section "Device"
    Identifier     "Device6"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:139:0:0"
EndSection

Section "Screen"
    Identifier     "Screen6"
    Device         "Device6"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection

Section "Device"
    Identifier     "Device7"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:140:0:0"
EndSection

Section "Screen"
    Identifier     "Screen7"
    Device         "Device7"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    SubSection     "Display"
        Depth       24
        Virtual 1024 768
    EndSubSection
EndSection

Section "ServerLayout"
    Identifier     "Layout0"
    Screen 0 "Screen0" 0 0
    Screen 1 "Screen1" 0 0
    Screen 2 "Screen2" 0 0
    Screen 3 "Screen3" 0 0
    Screen 4 "Screen4" 0 0
    Screen 5 "Screen5" 0 0
    Screen 6 "Screen6" 0 0
    Screen 7 "Screen7" 0 0
EndSection
MohitShridhar commented 2 years ago

@theorist17 no, I haven't used Kubernetes at all.

Can you try starting a virtual GL display with sudo nvidia-xconfig -a --use-display-device=None --virtual=1280x1024? See full instructions here.

If this doesn't work, you probably have to wrangle X until you can some how run glxgears without issues. There are a number of X related threads here that might be helpful.

theorist17 commented 2 years ago

Running ALFRED without sudo privilege on clustered computers is quite hard. I tried using docker inside docker of K8s to run things as root privileges, but building the docker image itself requires root privileges, because of some system software dependencies (related to nvidia-driver, X, nvidia-xconfig). So, I am using another server setting, which is a dedicated server (not clustered) with root privileges. After some work, both X and AI2Thor properly work.