Access the GPU without going through an X server

dcommander commented 9 years ago

There are spurious rumors that this either already is possible or will be possible soon with the nVidia drivers, by using EGL, but it is unclear exactly how (the Kronos EGL headers still seem to indicate that Xlib is required when using EGL on Un*x.) As soon as it is possible to do this, it would be a great enhancement for VirtualGL, since it would eliminate the need for a running X server on the server machine. I already know basically how to make such a system work in VirtualGL, because Sun used to have a proprietary API (GLP) that allowed us to accomplish the same thing on SPARC. Even as early as 2007, we identified EGL as a possible replacement for GLP, but Linux driver support was only available for it recently, and even where it is available, EGL still seems to be tied to X11 on Un*x systems. It is assumed that, eventually, that will have to change in order to support Wayland.

peci1 commented 4 years ago

On another cluster, with GPUs managed by SLURM resource manager, I get a segfault with glxinfo/glxgears:

$ VGL_DISPLAY=/dev/dri/card4 xvfb-run -a vglrun +v gdb glxgears      
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Reading symbols from glxgears...(no debugging symbols found)...done.
(gdb) start -info
Starting program: /usr/bin/glxgears -info
[VGL] Shared memory segment ID for vglconfig: 8126464
[VGL] VirtualGL v2.6.80 64-bit (Build 20200826)
[VGL] Opening EGL device /dev/dri/card4

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4f09ce0 in ?? () from /.singularity.d/libs/libEGL_nvidia.so.0
(gdb) bt
#0  0x00007ffff4f09ce0 in ?? () from /.singularity.d/libs/libEGL_nvidia.so.0
#1  0x00007ffff786ecb6 in vglfaker::init3D() () from /usr/lib/libvglfaker.so
#2  0x00007ffff78a835e in glxvisual::buildCfgAttribTable(_XDisplay*, int) () from /usr/lib/libvglfaker.so
#3  0x00007ffff78ac61a in glxvisual::chooseFBConfig(_XDisplay*, int, int const*, int&) () from /usr/lib/libvglfaker.so
#4  0x00007ffff78ad941 in glxvisual::configsFromVisAttribs(_XDisplay*, int, int const*, int&, bool) () from /usr/lib/libvglfaker.so
#5  0x00007ffff78870a3 in glXChooseVisual () from /usr/lib/libvglfaker.so
#6  0x000055555555758b in ?? ()
#7  0x0000555555555a87 in ?? ()
#8  0x00007ffff6b2eb97 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x000055555555641a in ?? ()

The fact is that on this cluster instance I don't have permissions to read or write to /dev/dri/card4. But is segfault really a good way to let the user know that?

dcommander commented 4 years ago

On a server, I got en error though:


$ DISPLAY=:3 vglrun +v -d /dev/dri/card4 glxgears -info
[VGL] Shared memory segment ID for vglconfig: 28901396
[VGL] VirtualGL v2.6.80 64-bit (Build 20200826)
[VGL] Opening EGL device /dev/dri/card4
[VGL] WARNING: Could not set WM_DELETE_WINDOW on window 0x00200002
GL_RENDERER   = GeForce GTX 1080 Ti/PCIe/SSE2
GL_VERSION    = OpenGL ES 1.1 NVIDIA 418.74
GL_VENDOR     = NVIDIA Corporation
GL_EXTENSIONS = GL_EXT_debug_label GL_EXT_map_buffer_range GL_EXT_robustness GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_s3tc GL_EXT_texture_format_BGRA8888 GL_KHR_debug GL_EXT_memory_object GL_EXT_memory_object_fd GL_EXT_semaphore GL_EXT_semaphore_fd GL_NV_memory_attachment GL_NV_texture_compression_s3tc GL_OES_compressed_ETC1_RGB8_texture GL_EXT_compressed_ETC1_RGB8_sub_texture GL_OES_compressed_paletted_texture GL_OES_draw_texture GL_OES_EGL_image GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_element_index_uint GL_OES_extended_matrix_palette GL_OES_fbo_render_mipmap GL_OES_framebuffer_object GL_OES_matrix_get GL_OES_matrix_palette GL_OES_packed_depth_stencil GL_OES_point_size_array GL_OES_point_sprite GL_OES_rgb8_rgba8 GL_OES_read_format GL_OES_stencil8 GL_OES_texture_cube_map GL_OES_texture_npot GL_OES_vertex_half_float 
VisualID 33, 0x21
[VGL] ERROR: in readPixels--
[VGL]    346: GL_ARB_pixel_buffer_object extension not available

It appears that this platform isn't providing support for the desktop OpenGL API through EGL. For some reason, it's giving an OpenGL ES context. The VirtualGL EGL back end certainly calls eglBindAPI(EGL_OPENGL_API) often enough, and it specifies EGL_RENDERABLE_TYPE, EGL_OPENGL_BIT. Can you run /opt/VirtualGL/bin/eglinfo with the same DRI device and post the output?

peci1 commented 4 years ago

Here it is:

$ /opt/VirtualGL/bin/eglinfo /dev/dri/card4
device: /dev/dri/card4
EGL client APIs string: OpenGL_ES OpenGL
EGL vendor string: NVIDIA
EGL version string: 1.5
display EGL extensions:
    EGL_EXT_client_sync, EGL_EXT_create_context_robustness, 
    EGL_EXT_output_base, EGL_EXT_output_drm, EGL_EXT_stream_acquire_mode, 
    EGL_EXT_stream_consumer_egloutput, EGL_EXT_sync_reuse, 
    EGL_IMG_context_priority, EGL_KHR_config_attribs, 
    EGL_KHR_context_flush_control, EGL_KHR_create_context, 
    EGL_KHR_create_context_no_error, EGL_KHR_display_reference, 
    EGL_KHR_fence_sync, EGL_KHR_get_all_proc_addresses, EGL_KHR_gl_colorspace, 
    EGL_KHR_gl_renderbuffer_image, EGL_KHR_gl_texture_2D_image, 
    EGL_KHR_gl_texture_3D_image, EGL_KHR_gl_texture_cubemap_image, 
    EGL_KHR_image, EGL_KHR_image_base, EGL_KHR_no_config_context, 
    EGL_KHR_reusable_sync, EGL_KHR_stream, EGL_KHR_stream_attrib, 
    EGL_KHR_stream_consumer_gltexture, EGL_KHR_stream_cross_process_fd, 
    EGL_KHR_stream_fifo, EGL_KHR_stream_producer_eglsurface, 
    EGL_KHR_surfaceless_context, EGL_KHR_swap_buffers_with_damage, 
    EGL_KHR_wait_sync, EGL_NV_nvrm_fence_sync, EGL_NV_output_drm_flip_event, 
    EGL_NV_stream_attrib, EGL_NV_stream_consumer_gltexture_yuv, 
    EGL_NV_stream_cross_display, EGL_NV_stream_cross_object, 
    EGL_NV_stream_cross_process, EGL_NV_stream_cross_system, 
    EGL_NV_stream_fifo_next, EGL_NV_stream_fifo_synchronous, 
    EGL_NV_stream_flush, EGL_NV_stream_metadata, EGL_NV_stream_remote, 
    EGL_NV_stream_reset, EGL_NV_stream_socket, EGL_NV_stream_socket_inet, 
    EGL_NV_stream_socket_unix, EGL_NV_stream_sync, EGL_NV_system_time
client EGL extensions:
    EGL_EXT_client_extensions, EGL_EXT_device_base, 
    EGL_EXT_device_enumeration, EGL_EXT_device_query, EGL_EXT_platform_base, 
    EGL_EXT_platform_device, EGL_EXT_platform_wayland, EGL_EXT_platform_x11, 
    EGL_KHR_client_get_all_proc_addresses, EGL_KHR_debug, 
    EGL_KHR_platform_x11, EGL_MESA_platform_gbm, 
    EGL_MESA_platform_surfaceless
EGL version: 1.5
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 11264 MB
    Total available memory: 11264 MB
    Currently available dedicated video memory: 11165 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce GTX 1080 Ti/PCIe/SSE2
OpenGL version string: 4.6.0 NVIDIA 418.74
OpenGL shading language version string: 4.60 NVIDIA
OpenGL extensions:
    GL_AMD_multi_draw_indirect, GL_AMD_seamless_cubemap_per_texture, 
    GL_AMD_vertex_shader_layer, GL_AMD_vertex_shader_viewport_index, 
    GL_ARB_ES2_compatibility, GL_ARB_ES3_1_compatibility, 
    GL_ARB_ES3_2_compatibility, GL_ARB_ES3_compatibility, 
    GL_ARB_arrays_of_arrays, GL_ARB_base_instance, GL_ARB_bindless_texture, 
    GL_ARB_blend_func_extended, GL_ARB_buffer_storage, 
    GL_ARB_clear_buffer_object, GL_ARB_clear_texture, GL_ARB_clip_control, 
    GL_ARB_color_buffer_float, GL_ARB_compatibility, 
    GL_ARB_compressed_texture_pixel_storage, GL_ARB_compute_shader, 
    GL_ARB_compute_variable_group_size, GL_ARB_conditional_render_inverted, 
    GL_ARB_conservative_depth, GL_ARB_copy_buffer, GL_ARB_copy_image, 
    GL_ARB_cull_distance, GL_ARB_debug_output, GL_ARB_depth_buffer_float, 
    GL_ARB_depth_clamp, GL_ARB_depth_texture, GL_ARB_derivative_control, 
    GL_ARB_direct_state_access, GL_ARB_draw_buffers, 
    GL_ARB_draw_buffers_blend, GL_ARB_draw_elements_base_vertex, 
    GL_ARB_draw_indirect, GL_ARB_draw_instanced, GL_ARB_enhanced_layouts, 
    GL_ARB_explicit_attrib_location, GL_ARB_explicit_uniform_location, 
    GL_ARB_fragment_coord_conventions, GL_ARB_fragment_layer_viewport, 
    GL_ARB_fragment_program, GL_ARB_fragment_program_shadow, 
    GL_ARB_fragment_shader, GL_ARB_fragment_shader_interlock, 
    GL_ARB_framebuffer_no_attachments, GL_ARB_framebuffer_object, 
    GL_ARB_framebuffer_sRGB, GL_ARB_geometry_shader4, 
    GL_ARB_get_program_binary, GL_ARB_get_texture_sub_image, GL_ARB_gl_spirv, 
    GL_ARB_gpu_shader5, GL_ARB_gpu_shader_fp64, GL_ARB_gpu_shader_int64, 
    GL_ARB_half_float_pixel, GL_ARB_half_float_vertex, GL_ARB_imaging, 
    GL_ARB_indirect_parameters, GL_ARB_instanced_arrays, 
    GL_ARB_internalformat_query, GL_ARB_internalformat_query2, 
    GL_ARB_invalidate_subdata, GL_ARB_map_buffer_alignment, 
    GL_ARB_map_buffer_range, GL_ARB_multi_bind, GL_ARB_multi_draw_indirect, 
    GL_ARB_multisample, GL_ARB_multitexture, GL_ARB_occlusion_query, 
    GL_ARB_occlusion_query2, GL_ARB_parallel_shader_compile, 
    GL_ARB_pipeline_statistics_query, GL_ARB_pixel_buffer_object, 
    GL_ARB_point_parameters, GL_ARB_point_sprite, GL_ARB_polygon_offset_clamp, 
    GL_ARB_post_depth_coverage, GL_ARB_program_interface_query, 
    GL_ARB_provoking_vertex, GL_ARB_query_buffer_object, 
    GL_ARB_robust_buffer_access_behavior, GL_ARB_robustness, 
    GL_ARB_sample_locations, GL_ARB_sample_shading, GL_ARB_sampler_objects, 
    GL_ARB_seamless_cube_map, GL_ARB_seamless_cubemap_per_texture, 
    GL_ARB_separate_shader_objects, GL_ARB_shader_atomic_counter_ops, 
    GL_ARB_shader_atomic_counters, GL_ARB_shader_ballot, 
    GL_ARB_shader_bit_encoding, GL_ARB_shader_clock, 
    GL_ARB_shader_draw_parameters, GL_ARB_shader_group_vote, 
    GL_ARB_shader_image_load_store, GL_ARB_shader_image_size, 
    GL_ARB_shader_objects, GL_ARB_shader_precision, 
    GL_ARB_shader_storage_buffer_object, GL_ARB_shader_subroutine, 
    GL_ARB_shader_texture_image_samples, GL_ARB_shader_texture_lod, 
    GL_ARB_shader_viewport_layer_array, GL_ARB_shading_language_100, 
    GL_ARB_shading_language_420pack, GL_ARB_shading_language_include, 
    GL_ARB_shading_language_packing, GL_ARB_shadow, GL_ARB_sparse_buffer, 
    GL_ARB_sparse_texture, GL_ARB_sparse_texture2, 
    GL_ARB_sparse_texture_clamp, GL_ARB_spirv_extensions, 
    GL_ARB_stencil_texturing, GL_ARB_sync, GL_ARB_tessellation_shader, 
    GL_ARB_texture_barrier, GL_ARB_texture_border_clamp, 
    GL_ARB_texture_buffer_object, GL_ARB_texture_buffer_object_rgb32, 
    GL_ARB_texture_buffer_range, GL_ARB_texture_compression, 
    GL_ARB_texture_compression_bptc, GL_ARB_texture_compression_rgtc, 
    GL_ARB_texture_cube_map, GL_ARB_texture_cube_map_array, 
    GL_ARB_texture_env_add, GL_ARB_texture_env_combine, 
    GL_ARB_texture_env_crossbar, GL_ARB_texture_env_dot3, 
    GL_ARB_texture_filter_anisotropic, GL_ARB_texture_filter_minmax, 
    GL_ARB_texture_float, GL_ARB_texture_gather, 
    GL_ARB_texture_mirror_clamp_to_edge, GL_ARB_texture_mirrored_repeat, 
    GL_ARB_texture_multisample, GL_ARB_texture_non_power_of_two, 
    GL_ARB_texture_query_levels, GL_ARB_texture_query_lod, 
    GL_ARB_texture_rectangle, GL_ARB_texture_rg, GL_ARB_texture_rgb10_a2ui, 
    GL_ARB_texture_stencil8, GL_ARB_texture_storage, 
    GL_ARB_texture_storage_multisample, GL_ARB_texture_swizzle, 
    GL_ARB_texture_view, GL_ARB_timer_query, GL_ARB_transform_feedback2, 
    GL_ARB_transform_feedback3, GL_ARB_transform_feedback_instanced, 
    GL_ARB_transform_feedback_overflow_query, GL_ARB_transpose_matrix, 
    GL_ARB_uniform_buffer_object, GL_ARB_vertex_array_bgra, 
    GL_ARB_vertex_array_object, GL_ARB_vertex_attrib_64bit, 
    GL_ARB_vertex_attrib_binding, GL_ARB_vertex_buffer_object, 
    GL_ARB_vertex_program, GL_ARB_vertex_shader, 
    GL_ARB_vertex_type_10f_11f_11f_rev, GL_ARB_vertex_type_2_10_10_10_rev, 
    GL_ARB_viewport_array, GL_ARB_window_pos, GL_ATI_draw_buffers, 
    GL_ATI_texture_float, GL_ATI_texture_mirror_once, 
    GL_EXTX_framebuffer_mixed_formats, GL_EXT_Cg_shader, 
    GL_EXT_EGL_image_storage, GL_EXT_abgr, GL_EXT_bgra, 
    GL_EXT_bindable_uniform, GL_EXT_blend_color, 
    GL_EXT_blend_equation_separate, GL_EXT_blend_func_separate, 
    GL_EXT_blend_minmax, GL_EXT_blend_subtract, GL_EXT_compiled_vertex_array, 
    GL_EXT_depth_bounds_test, GL_EXT_direct_state_access, 
    GL_EXT_draw_buffers2, GL_EXT_draw_instanced, GL_EXT_draw_range_elements, 
    GL_EXT_fog_coord, GL_EXT_framebuffer_blit, GL_EXT_framebuffer_multisample, 
    GL_EXT_framebuffer_multisample_blit_scaled, GL_EXT_framebuffer_object, 
    GL_EXT_framebuffer_sRGB, GL_EXT_geometry_shader4, 
    GL_EXT_gpu_program_parameters, GL_EXT_gpu_shader4, 
    GL_EXT_import_sync_object, GL_EXT_memory_object, GL_EXT_memory_object_fd, 
    GL_EXT_multi_draw_arrays, GL_EXT_packed_depth_stencil, 
    GL_EXT_packed_float, GL_EXT_packed_pixels, GL_EXT_pixel_buffer_object, 
    GL_EXT_point_parameters, GL_EXT_polygon_offset_clamp, 
    GL_EXT_post_depth_coverage, GL_EXT_provoking_vertex, 
    GL_EXT_raster_multisample, GL_EXT_rescale_normal, GL_EXT_secondary_color, 
    GL_EXT_semaphore, GL_EXT_semaphore_fd, GL_EXT_separate_shader_objects, 
    GL_EXT_separate_specular_color, GL_EXT_shader_image_load_formatted, 
    GL_EXT_shader_image_load_store, GL_EXT_shader_integer_mix, 
    GL_EXT_shadow_funcs, GL_EXT_sparse_texture2, GL_EXT_stencil_two_side, 
    GL_EXT_stencil_wrap, GL_EXT_texture3D, GL_EXT_texture_array, 
    GL_EXT_texture_buffer_object, GL_EXT_texture_compression_dxt1, 
    GL_EXT_texture_compression_latc, GL_EXT_texture_compression_rgtc, 
    GL_EXT_texture_compression_s3tc, GL_EXT_texture_cube_map, 
    GL_EXT_texture_edge_clamp, GL_EXT_texture_env_add, 
    GL_EXT_texture_env_combine, GL_EXT_texture_env_dot3, 
    GL_EXT_texture_filter_anisotropic, GL_EXT_texture_filter_minmax, 
    GL_EXT_texture_integer, GL_EXT_texture_lod, GL_EXT_texture_lod_bias, 
    GL_EXT_texture_mirror_clamp, GL_EXT_texture_object, GL_EXT_texture_sRGB, 
    GL_EXT_texture_sRGB_R8, GL_EXT_texture_sRGB_decode, 
    GL_EXT_texture_shared_exponent, GL_EXT_texture_storage, 
    GL_EXT_texture_swizzle, GL_EXT_timer_query, GL_EXT_transform_feedback2, 
    GL_EXT_vertex_array, GL_EXT_vertex_array_bgra, GL_EXT_vertex_attrib_64bit, 
    GL_EXT_window_rectangles, GL_IBM_rasterpos_clip, 
    GL_IBM_texture_mirrored_repeat, GL_KHR_blend_equation_advanced, 
    GL_KHR_blend_equation_advanced_coherent, GL_KHR_context_flush_control, 
    GL_KHR_debug, GL_KHR_no_error, GL_KHR_parallel_shader_compile, 
    GL_KHR_robust_buffer_access_behavior, GL_KHR_robustness, 
    GL_KTX_buffer_region, GL_NVX_blend_equation_advanced_multi_draw_buffers, 
    GL_NVX_conditional_render, GL_NVX_gpu_memory_info, 
    GL_NV_ES1_1_compatibility, GL_NV_ES3_1_compatibility, 
    GL_NV_alpha_to_coverage_dither_control, GL_NV_bindless_multi_draw_indirect, 
    GL_NV_bindless_multi_draw_indirect_count, GL_NV_bindless_texture, 
    GL_NV_blend_equation_advanced, GL_NV_blend_equation_advanced_coherent, 
    GL_NV_blend_minmax_factor, GL_NV_blend_square, GL_NV_clip_space_w_scaling, 
    GL_NV_command_list, GL_NV_compute_program5, GL_NV_conditional_render, 
    GL_NV_conservative_raster, GL_NV_conservative_raster_dilate, 
    GL_NV_conservative_raster_pre_snap_triangles, GL_NV_copy_depth_to_color, 
    GL_NV_copy_image, GL_NV_depth_buffer_float, GL_NV_depth_clamp, 
    GL_NV_draw_texture, GL_NV_draw_vulkan_image, GL_NV_explicit_multisample, 
    GL_NV_feature_query, GL_NV_fence, GL_NV_fill_rectangle, 
    GL_NV_float_buffer, GL_NV_fog_distance, GL_NV_fragment_coverage_to_color, 
    GL_NV_fragment_program, GL_NV_fragment_program2, 
    GL_NV_fragment_program_option, GL_NV_fragment_shader_interlock, 
    GL_NV_framebuffer_mixed_samples, GL_NV_framebuffer_multisample_coverage, 
    GL_NV_geometry_shader4, GL_NV_geometry_shader_passthrough, 
    GL_NV_gpu_program4, GL_NV_gpu_program4_1, GL_NV_gpu_program5, 
    GL_NV_gpu_program5_mem_extended, GL_NV_gpu_program_fp64, 
    GL_NV_gpu_shader5, GL_NV_half_float, GL_NV_internalformat_sample_query, 
    GL_NV_light_max_exponent, GL_NV_memory_attachment, 
    GL_NV_multisample_coverage, GL_NV_multisample_filter_hint, 
    GL_NV_occlusion_query, GL_NV_packed_depth_stencil, 
    GL_NV_parameter_buffer_object, GL_NV_parameter_buffer_object2, 
    GL_NV_path_rendering, GL_NV_path_rendering_shared_edge, 
    GL_NV_point_sprite, GL_NV_primitive_restart, GL_NV_query_resource, 
    GL_NV_query_resource_tag, GL_NV_register_combiners, 
    GL_NV_register_combiners2, GL_NV_robustness_video_memory_purge, 
    GL_NV_sample_locations, GL_NV_sample_mask_override_coverage, 
    GL_NV_shader_atomic_counters, GL_NV_shader_atomic_float, 
    GL_NV_shader_atomic_float64, GL_NV_shader_atomic_fp16_vector, 
    GL_NV_shader_atomic_int64, GL_NV_shader_buffer_load, 
    GL_NV_shader_storage_buffer_object, GL_NV_shader_thread_group, 
    GL_NV_shader_thread_shuffle, GL_NV_stereo_view_rendering, 
    GL_NV_texgen_reflection, GL_NV_texture_barrier, 
    GL_NV_texture_compression_vtc, GL_NV_texture_env_combine4, 
    GL_NV_texture_multisample, GL_NV_texture_rectangle, 
    GL_NV_texture_rectangle_compressed, GL_NV_texture_shader, 
    GL_NV_texture_shader2, GL_NV_texture_shader3, GL_NV_transform_feedback, 
    GL_NV_transform_feedback2, GL_NV_uniform_buffer_unified_memory, 
    GL_NV_vertex_attrib_integer_64bit, GL_NV_vertex_buffer_unified_memory, 
    GL_NV_vertex_program, GL_NV_vertex_program1_1, GL_NV_vertex_program2, 
    GL_NV_vertex_program2_option, GL_NV_vertex_program3, 
    GL_NV_viewport_array2, GL_NV_viewport_swizzle, GL_OVR_multiview, 
    GL_OVR_multiview2, GL_S3_s3tc, GL_SGIS_generate_mipmap, 
    GL_SGIS_texture_lod, GL_SGIX_depth_texture, GL_SGIX_shadow, 
    GL_SUN_slice_accum

65 EGLConfigs:
Cfg   tra buf lev buf colorbuffer   dep ste client APIs   ms  cav  surf
ID    ns  sz  el  typ r  g  b  a  F th  ncl GL ES ES2 VG ns b eat  typ
-----------------------------------------------------------------------
0x001 .   32  0   rgb 8  8  8  8  . 24  8   y  y  y   .  0  0 None P..
0x002 .   32  0   rgb 8  8  8  8  . 24  0   y  y  y   .  0  0 None P..
0x003 .   32  0   rgb 8  8  8  8  . 0   8   y  y  y   .  0  0 None P..
0x004 .   32  0   rgb 8  8  8  8  . 0   0   y  y  y   .  0  0 None P..
0x005 .   32  0   rgb 8  8  8  8  . 24  8   y  y  y   .  2  1 None P..
0x006 .   32  0   rgb 8  8  8  8  . 24  0   y  y  y   .  2  1 None P..
0x007 .   32  0   rgb 8  8  8  8  . 0   8   y  y  y   .  2  1 None P..
0x008 .   32  0   rgb 8  8  8  8  . 0   0   y  y  y   .  2  1 None P..
0x009 .   32  0   rgb 8  8  8  8  . 24  8   y  y  y   .  4  1 None P..
0x00a .   32  0   rgb 8  8  8  8  . 24  8   y  y  y   .  4  1 None P..
0x00b .   32  0   rgb 8  8  8  8  . 24  0   y  y  y   .  4  1 None P..
0x00c .   32  0   rgb 8  8  8  8  . 24  0   y  y  y   .  4  1 None P..
0x00d .   32  0   rgb 8  8  8  8  . 0   8   y  y  y   .  4  1 None P..
0x00e .   32  0   rgb 8  8  8  8  . 0   8   y  y  y   .  4  1 None P..
0x00f .   32  0   rgb 8  8  8  8  . 0   0   y  y  y   .  4  1 None P..
0x010 .   32  0   rgb 8  8  8  8  . 0   0   y  y  y   .  4  1 None P..
0x011 .   32  0   rgb 8  8  8  8  . 24  8   y  y  y   .  8  1 None P..
0x012 .   32  0   rgb 8  8  8  8  . 24  0   y  y  y   .  8  1 None P..
0x013 .   32  0   rgb 8  8  8  8  . 0   8   y  y  y   .  8  1 None P..
0x014 .   32  0   rgb 8  8  8  8  . 0   0   y  y  y   .  8  1 None P..
0x015 .   24  0   rgb 8  8  8  0  . 24  8   y  y  y   .  0  0 None P..
0x016 .   24  0   rgb 8  8  8  0  . 24  0   y  y  y   .  0  0 None P..
0x017 .   24  0   rgb 8  8  8  0  . 0   8   y  y  y   .  0  0 None P..
0x018 .   24  0   rgb 8  8  8  0  . 0   0   y  y  y   .  0  0 None P..
0x019 .   24  0   rgb 8  8  8  0  . 24  8   y  y  y   .  2  1 None P..
0x01a .   24  0   rgb 8  8  8  0  . 24  0   y  y  y   .  2  1 None P..
0x01b .   24  0   rgb 8  8  8  0  . 0   8   y  y  y   .  2  1 None P..
0x01c .   24  0   rgb 8  8  8  0  . 0   0   y  y  y   .  2  1 None P..
0x01d .   24  0   rgb 8  8  8  0  . 24  8   y  y  y   .  4  1 None P..
0x01e .   24  0   rgb 8  8  8  0  . 24  8   y  y  y   .  4  1 None P..
0x01f .   24  0   rgb 8  8  8  0  . 24  0   y  y  y   .  4  1 None P..
0x020 .   24  0   rgb 8  8  8  0  . 24  0   y  y  y   .  4  1 None P..
0x021 .   24  0   rgb 8  8  8  0  . 0   8   y  y  y   .  4  1 None P..
0x022 .   24  0   rgb 8  8  8  0  . 0   8   y  y  y   .  4  1 None P..
0x023 .   24  0   rgb 8  8  8  0  . 0   0   y  y  y   .  4  1 None P..
0x024 .   24  0   rgb 8  8  8  0  . 0   0   y  y  y   .  4  1 None P..
0x025 .   24  0   rgb 8  8  8  0  . 24  8   y  y  y   .  8  1 None P..
0x026 .   24  0   rgb 8  8  8  0  . 24  0   y  y  y   .  8  1 None P..
0x027 .   24  0   rgb 8  8  8  0  . 0   8   y  y  y   .  8  1 None P..
0x028 .   24  0   rgb 8  8  8  0  . 0   0   y  y  y   .  8  1 None P..
0x029 .   16  0   rgb 5  6  5  0  . 24  8   y  y  y   .  0  0 None P..
0x02a .   16  0   rgb 5  6  5  0  . 24  0   y  y  y   .  0  0 None P..
0x02b .   16  0   rgb 5  6  5  0  . 16  0   y  y  y   .  0  0 None P..
0x02c .   16  0   rgb 5  6  5  0  . 0   8   y  y  y   .  0  0 None P..
0x02d .   16  0   rgb 5  6  5  0  . 0   0   y  y  y   .  0  0 None P..
0x02e .   16  0   rgb 5  6  5  0  . 24  8   y  y  y   .  2  1 None P..
0x02f .   16  0   rgb 5  6  5  0  . 24  0   y  y  y   .  2  1 None P..
0x030 .   16  0   rgb 5  6  5  0  . 16  0   y  y  y   .  2  1 None P..
0x031 .   16  0   rgb 5  6  5  0  . 0   8   y  y  y   .  2  1 None P..
0x032 .   16  0   rgb 5  6  5  0  . 0   0   y  y  y   .  2  1 None P..
0x033 .   16  0   rgb 5  6  5  0  . 24  8   y  y  y   .  4  1 None P..
0x034 .   16  0   rgb 5  6  5  0  . 24  8   y  y  y   .  4  1 None P..
0x035 .   16  0   rgb 5  6  5  0  . 24  0   y  y  y   .  4  1 None P..
0x036 .   16  0   rgb 5  6  5  0  . 24  0   y  y  y   .  4  1 None P..
0x037 .   16  0   rgb 5  6  5  0  . 16  0   y  y  y   .  4  1 None P..
0x038 .   16  0   rgb 5  6  5  0  . 16  0   y  y  y   .  4  1 None P..
0x039 .   16  0   rgb 5  6  5  0  . 0   8   y  y  y   .  4  1 None P..
0x03a .   16  0   rgb 5  6  5  0  . 0   8   y  y  y   .  4  1 None P..
0x03b .   16  0   rgb 5  6  5  0  . 0   0   y  y  y   .  4  1 None P..
0x03c .   16  0   rgb 5  6  5  0  . 0   0   y  y  y   .  4  1 None P..
0x03d .   16  0   rgb 5  6  5  0  . 24  8   y  y  y   .  8  1 None P..
0x03e .   16  0   rgb 5  6  5  0  . 24  0   y  y  y   .  8  1 None P..
0x03f .   16  0   rgb 5  6  5  0  . 16  0   y  y  y   .  8  1 None P..
0x040 .   16  0   rgb 5  6  5  0  . 0   8   y  y  y   .  8  1 None P..
0x041 .   16  0   rgb 5  6  5  0  . 0   0   y  y  y   .  8  1 None P..

dcommander commented 4 years ago

The fact is that on this cluster instance I don't have permissions to read or write to /dev/dri/card4. But is segfault really a good way to let the user know that?

No, of course not, but that does not occur on my test system.

# sudo chmod 600 /dev/dri/*
# DISPLAY=:1 vglrun -d /dev/dri/card0 glxgears
[VGL] ERROR: in init3D--
[VGL]    216: Could not open EGL display

peci1 commented 4 years ago

No, of course not, but that does not occur on my test system.

Is there a test build with debug symbols?

dcommander commented 4 years ago

Thanks for the eglinfo output. That shows me that your system is at least capable of desktop OpenGL with EGL, so now the mystery is why it is selecting the OpenGL ES API. I don't have a GeForce available for testing, but I'll try 418.xx to see if maybe the issue is due to the driver revision.

The RPMs have debug symbols if you install the accompanying -debuginfo packages. The DEB packages do not have debug symbols.

dcommander commented 4 years ago

I'm testing with some older driver versions and am able to reproduce some of the reported issues. Stand by.

ffeldhaus commented 4 years ago

Really great work! I was just able to run glxspheres64 within a docker container within a VM on Google Cloud with a Tesla K80 GPU and stream it via the web using the xpra HTML5 client. I used the latest NVIDIA 440 drivers. I achieved ~140 Frames/s whereas I had 11 Frames/s before with the software renderer. I will share details later once I cleaned up the setup.

dcommander commented 4 years ago

Observations with various driver revisions and my Quadro K5000:

390.138: Only supports EGL 1.4, which means that GLX_ARB_create_context can't be emulated (since EGL_CONTEXT_OPENGL_DEBUG and EGL_CONTEXT_OPENGL_FORWARD_COMPATIBLE require EGL 1.5.) Once I fixed that issue in the faker, fakerut passed. No other issues were observed.

418.113: I observed the same issue reported in https://github.com/VirtualGL/virtualgl/issues/10#issuecomment-680790031 whereby the driver returned an OpenGL ES context regardless. No idea why it's occurring or how to work around it, but it appears to be a driver bug.

430.64: no issues observed

440.100: no issues observed

450.66: no issues observed

I'll have an older Quadro that requires 390.xx. I'll also test that whenever I get a chance.

dcommander commented 4 years ago

Verified correct operation of the EGL back end with my old Quadro 600 and v390.138. The takeaways thus far:

If your GPU is new enough to use 430.xx or later, then upgrade to 430.xx or later.
418.xx definitely does not work. If you are using that driver version and cannot upgrade, then try downgrading. I haven't tested any of the releases between 418.xx and 390.xx (they don't appear to be available in the driver archive for any of my GPUs), but I know that 390.xx works (albeit with slightly reduced functionality, i.e. no support for GLX_ARB_create_context.)

I also tested my Radeon Pro WX2100 with amdgpu 20.05. It appears to at least have the necessary EGL functionality, but there are numerous conformance issues that prevent it from working properly (similar to the ones described here vis-a-vis the GLX back end.)

dcommander commented 4 years ago

I looked into implementing glXCopyContext(), and unfortunately, that may not be possible. The issue is that there isn't always a 1:1 correspondence between get and set functions for certain attributes in OpenGL. Also, all of the fixed-function attributes are obsolete and no longer available except with compatibility contexts. I don't see a way to implement glXCopyContext() that isn't terribly error-prone. I would rather just wait and see whether any modern applications actually use it. My guess would be no, and avoiding that particular albatross would be consistent with the legacy-free design of the EGL back end (it also doesn't support accumulation and aux buffers, color index rendering, and other features that went away in OpenGL 3.1.)

dcommander commented 4 years ago

Further justification for not supporting glXCopyContext(): Few OpenGL implementations seem to implement that function properly or at all. On my specific test systems, the AMD Catalyst drivers throw BadRequest (which tells me that glXCopyContext() isn't implemented at all.) The VMWare drivers throw BadMatch, which either means that the function isn't implemented or that its criteria for evaluating source and destination context compatibility is broken. The nVidia drivers support the function, but their implementation places certain attributes in the wrong category.

ffeldhaus commented 4 years ago

I now have the docker image with xpra and the VirtualGL preview in a shape that it can be used, but I still need to test, cleanup and optimize the image. If you want to try it, check it out here: https://github.com/ffeldhaus/docker-xpra-html5-opengl This should also help solve issues #98 and #113

Micket commented 4 years ago

I had a go at this for my clusters login nodes, running CentOS 7.8, nvidia drivers 450.51.06. Good old GLX has been working great for several years (thanks!), and CUDA stuff works well as a user, so permissions should be fine.

I'm having issues just running eglinfo. Just got straight up: Error: unable to open EGL display. Sprinkling in some more debug printouts, i see it somehow found 4 devices (despite there being just 1 GPU), looping over them and finding one that the last device contained the device string == /dev/dri/card0 that I was looking for. But, still can't open the display.

Suspecting old Mesa was doing something bad here, I tried overriding the vendor

export __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/10_nvidia.json

instead resulted in Error: invalid EGL device because apparently eglQueryDeviceStringEXT just returned null for the only device listed in this case.

However, hardcoding i=0, it works.

Building a newer Mesa and libglvnd, it finds 3 devices instead:

device 0  devStr = (nil)
device 1  devstr = /dev/dri/card1
device 2  devstr = (nil)

Hardcoding device 0: nvidia (works) Harccoding device 1: yields libEGL warning "not allowed to force software rendering when API explicitly selects a hardware device". Hardcoding device 2: llvmpipe (works)

Really not sure what to make of this. On my home computer with an AMD gpu it works flawlessly.

dcommander commented 4 years ago

It doesn’t work flawlessly on my AMDGPU machine. The buffer swapping is all messed up for some reason (same is true when using the GLX back end— the AMDGPU drivers have numerous conformance issues, all of which I have reported to AMD.)

Even if the device permissions were working with the GLX back end, you still need to re-run vglserver_config, because the EGL back end additionally needs access to /dev/dri/render*.

Micket commented 4 years ago

@dcommander Thanks! Indeed, fixing the permissions on the /dev/dri/render* devices also made the device string show up for the devices /dev/dri/card* (I didn't think there was a connection there for just running eglinfo).

Sorry for the confusion; I meant eglinfo worked on my AMD gpu (never really got beyond that state last night).

eglinfo and vglrun -d /dev/dri/cardX glxspheres64 now work, but I am having some trouble running anything more complex (e.g. ParaView). I strongly suspect there is something broken in my test environment, as I am having trouble with the GLX backend as well in this build.

dcommander commented 4 years ago

That makes sense. eglinfo works fine on my AMDGPU machine as well. GLXspheres with the EGL back end acts like it works, but due to the swapping issue, the image flickers and never updates.

Micket commented 4 years ago

So, testing on CentOS 7.8, nvidia drivers 450.51.06, using the 2.6.80-20200828.x86_64 RPM over Thinlinc/VNC. Ran the vgl-server config, and also restarted the display manager.

Simpler programs seem to work fine in both GLX and EGL backends;

vglrun +v -d /dev/dri/card1 glxinfo vglrun +v glxinfo
vglrun +v -d /dev/dri/card1 glxgears vglrun +v glxgears
vglrun +v -d /dev/dri/card1 glxspheres64 vglrun +v -glxspheres64

Not much success with other software;

vglrun +v mathematica GLX mode seems to work fine, as before
vglrun +v -d /dev/dri/card1 mathematica segfaults on startup
vglrun +v matlab -nosoftwareopengl segfault upon 3d plot
vglrun +v -d /dev/dri/card1 matlab -nosoftwareopengl segfault upon 3d plot
vglrun +v paraview segfault upon startup
vglrun +v -d /dev/dri/card1 paraview segfault upon startup

ParaView and MATLAB both hint at some issue with XQueryExtension in libvglfaker/libX11:

[VGL] Opening EGL device /dev/dri/card1
[alvis1:95007] *** Process received signal ***
[alvis1:95007] Signal: Segmentation fault (11)
[alvis1:95007] Signal code: Address not mapped (1)
[alvis1:95007] Failing at address: (nil)
[alvis1:95007] [ 0] /lib64/libpthread.so.0(+0xf630)[0x7f3f828bd630]
[alvis1:95007] [ 1] /apps/Vera/software/Compiler/GCCcore/8.3.0/X11/20190717/lib/libX11.so.6(XQueryExtension+0x93)[0x7f3f86440cb3]
[alvis1:95007] [ 2] /lib64/libvglfaker.so(+0x5ad43)[0x7f3f8bb3fd43]
[alvis1:95007] [ 3] /lib64/libvglfaker.so(glXQueryExtension+0x123)[0x7f3f8bafd9e3]
[alvis1:95007] [ 4] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkRenderingOpenGL2-pv5.6.so.1(_ZN22vtkXOpenGLRenderWindow13CreateAWindowEv+0x242)[0x7f3f867d1c22]
[alvis1:95007] [ 5] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkRenderingOpenGL2-pv5.6.so.1(_ZN21vtkOpenGLRenderWindow29CreateHardwareOffScreenWindowEii+0x3a)[0x7f3f8671d53a]
[alvis1:95007] [ 6] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkRenderingOpenGL2-pv5.6.so.1(_ZN22vtkXOpenGLRenderWindow21CreateOffScreenWindowEii+0x1e)[0x7f3f867d287e]
[alvis1:95007] [ 7] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkRenderingOpenGL2-pv5.6.so.1(_ZN21vtkOpenGLRenderWindow14SupportsOpenGLEv+0x335)[0x7f3f867182b5]
[alvis1:95007] [ 8] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkPVClientServerCoreRendering-pv5.6.so.1(_ZN37vtkPVRenderingCapabilitiesInformation20GetLocalCapabilitiesEv+0x102)[0x7f3f87ed3642]
[alvis1:95007] [ 9] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkPVClientServerCoreRendering-pv5.6.so.1(_ZN37vtkPVRenderingCapabilitiesInformation14CopyFromObjectEP9vtkObject+0x9)[0x7f3f87ed3669]
[alvis1:95007] [10] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkPVServerImplementationCore-pv5.6.so.1(_ZN16vtkPVSessionCore25GatherInformationInternalEP16vtkPVInformationj+0x1f)[0x7f3f878ed7bf]
[alvis1:95007] [11] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkPVServerImplementationCore-pv5.6.so.1(_ZN16vtkPVSessionCore17GatherInformationEjP16vtkPVInformationj+0x3e)[0x7f3f878edc6e]
[alvis1:95007] [12] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkpqApplicationComponents-pv5.6.so.1(_ZN21pqDefaultViewBehavior16onServerCreationEP8pqServer+0x43)[0x7f3f8b920413]
[alvis1:95007] [13] /apps/Vera/software/Compiler/GCCcore/8.3.0/Qt5/5.13.1/lib/libQt5Core.so.5(_ZN11QMetaObject8activateEP7QObjectiiPPv+0x698)[0x7f3f884f05a8]
[alvis1:95007] [14] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkpqCore-pv5.6.so.1(_ZN20pqServerManagerModel11serverAddedEP8pqServer+0x32)[0x7f3f8b2c3c72]
[alvis1:95007] [15] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkpqCore-pv5.6.so.1(_ZN20pqServerManagerModel19onConnectionCreatedEx+0x31e)[0x7f3f8b28b82e]
[alvis1:95007] [16] /apps/Vera/software/Compiler/GCCcore/8.3.0/Qt5/5.13.1/lib/libQt5Core.so.5(_ZN11QMetaObject8activateEP7QObjectiiPPv+0x698)[0x7f3f884f05a8]
[alvis1:95007] [17] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkpqCore-pv5.6.so.1(_ZN23pqServerManagerObserver17connectionCreatedEx+0x32)[0x7f3f8b2c4f52]
[alvis1:95007] [18] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkpqCore-pv5.6.so.1(+0x174309)[0x7f3f8b2c5309]
[alvis1:95007] [19] /apps/Vera/software/Compiler/GCCcore/8.3.0/Qt5/5.13.1/lib/libQt5Core.so.5(_ZN11QMetaObject8activateEP7QObjectiiPPv+0x698)[0x7f3f884f05a8]
[alvis1:95007] [20] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkGUISupportQt-pv5.6.so.1(+0x5dfbb)[0x7f3f88228fbb]
[alvis1:95007] [21] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkGUISupportQt-pv5.6.so.1(+0x3f01f)[0x7f3f8820a01f]
[alvis1:95007] [22] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkCommonCore-pv5.6.so.1(_ZN18vtkCallbackCommand7ExecuteEP9vtkObjectmPv+0x1a)[0x7f3f81b886da]
[alvis1:95007] [23] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkCommonCore-pv5.6.so.1(+0x254dc2)[0x7f3f81c8cdc2]
[alvis1:95007] [24] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkPVClientServerCoreCore-pv5.6.so.1(_ZN16vtkProcessModule15RegisterSessionEP10vtkSession+0xa8)[0x7f3f877be4e8]
[alvis1:95007] [25] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkPVServerManagerCore-pv5.6.so.1(_ZN12vtkSMSession13ConnectToSelfEi+0x69)[0x7f3f87adcd79]
[alvis1:95007] [26] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkpqCore-pv5.6.so.1(_ZN15pqObjectBuilder12createServerERK16pqServerResourcei+0x1eb)[0x7f3f8b2407cb]
[alvis1:95007] [27] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkpqApplicationComponents-pv5.6.so.1(_ZN25pqAlwaysConnectedBehavior11serverCheckEv+0x94)[0x7f3f8b8ccac4]
[alvis1:95007] [28] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkpqApplicationComponents-pv5.6.so.1(_ZN25pqAlwaysConnectedBehaviorC2EP7QObject+0x114)[0x7f3f8b8ccbf4]
[alvis1:95007] [29] /apps/Vera/software/MPI/GCC/8.3.0/OpenMPI/3.1.4/ParaView/5.6.2-Python-3.7.4-mpi/lib/libvtkpqApplicationComponents-pv5.6.so.1(_ZN19pqParaViewBehaviorsC1EP11QMainWindowP7QObject+0x998)[0x7f3f8b950648]

Matlab:

Stack Trace (from fault):
[  0] 0x00007f1e702c6001                              /lib64/libX11.so.6+00233473 XQueryExtension+00000161
[  1] 0x00007f1e72710deb                              /lib64/libvglfaker.so+00372203
[  2] 0x00007f1e726ce9e3                              /lib64/libvglfaker.so+00100835 glXQueryExtension+00000291

dcommander commented 4 years ago

Can you verify with xdpyinfo whether the ThinLinc X server has a GLX extension? That may help me reproduce the issue.

Micket commented 4 years ago

From the Thinlinc session;

$ xdpyinfo 
name of display:    :10
version number:    11.0
vendor string:    Cendio ThinLinc
...
number of extensions:    24
    BIG-REQUESTS
    Composite
    DAMAGE
    DOUBLE-BUFFER
    DPMS
    GLX
    Generic Event Extension
    MIT-SCREEN-SAVER
    MIT-SHM
    RANDR
    RECORD
    RENDER
    SGI-GLX
    SHAPE
    SYNC
    VNC-EXTENSION
    X-Resource
    XC-MISC
    XFIXES
    XINERAMA
    XInputExtension
    XKEYBOARD
    XTEST
    XVideo
...

and the "background" X session in case that's relevant;

$ xdpyinfo -display :0
name of display:    :0
version number:    11.0
vendor string:    The X.Org Foundation
...
number of extensions:    28
...
    GLX

dcommander commented 4 years ago

That’s very odd. I will try to reproduce the issue, because I don’t understand how it could occur if the 2D X server has a GLX extension.

Micket commented 4 years ago

I'll just add that I've tried this on 2 different machines now (one with much simpler xorg conf as it only has 1 gpu), and I also tried it via Xpra instead of Thinlinc, as well as with RPM installed paraview from EPEL.

In all cases, it seems to have some XQueryExtension related crash regardless of backend, and downgrading to 2.6.5 it immediately works again. This is quite far from my expertise, so I'm not sure how to provide any better debugging output.

dcommander commented 4 years ago

Oh, so you are seeing a crash with the GLX back end as well?

Micket commented 4 years ago

Yes, in summary; glxinfo, glxgears, glxspheres64 - both backends work. paraview, matlab - segfaults regardless of backend (both eventually crashing in XQueryExtension in libvglfaker/libX11)

peci1 commented 4 years ago

I did more tests on the cluster where EGL rendering using driver 418 wrongly returned OpenGL ES 1.1 context. I found one machine on the cluster with driver version 440, and there my test app works. So it is very probably an issue with the driver alone.

Nevertheless, on that machine with driver 440 (and on my laptop, also 440), I have problems running Qt5 OpenGL windows - the application segfaults. The Qt5-related part of stack trace is this:

/lib/x86_64-linux-gnu/libc.so.6(0x7fa44b930fd0) [0x7fa44b930fd0]
/usr/lib/x86_64-linux-gnu/libX11-xcb.so.1(XGetXCBConnection+0x7) [0x7fa43fb0b5c7]
/usr/lib/libvglfaker.so(xcb_get_extension_data+0x16a) [0x7fa44c1d5e1a]
/usr/lib/x86_64-linux-gnu/qt5/plugins/xcbglintegrations/libqxcb-glx-integration.so(0x7fa43b58efb2) [0x7fa43b58efb2]
/usr/lib/x86_64-linux-gnu/libQt5XcbQpa.so.5(_ZN14QXcbConnectionC2EP19QXcbNativeInterfacebjPKc+0x679) [0x7fa44021a369]
/usr/lib/x86_64-linux-gnu/libQt5XcbQpa.so.5(_ZN15QXcbIntegrationC2ERK11QStringListRiPPc+0x2fe) [0x7fa44021d7fe]
/usr/lib/x86_64-linux-gnu/qt5/plugins/platforms/libqxcb.so(0x7fa4404e32ab) [0x7fa4404e32ab]
/usr/lib/x86_64-linux-gnu/libQt5Gui.so.5(_ZN27QPlatformIntegrationFactory6createERK7QStringRK11QStringListRiPPcS2_+0x11d) [0x7fa445610add]
/usr/lib/x86_64-linux-gnu/libQt5Gui.so.5(_ZN22QGuiApplicationPrivate25createPlatformIntegrationEv+0x6a2) [0x7fa445621922]
/usr/lib/x86_64-linux-gnu/libQt5Gui.so.5(_ZN22QGuiApplicationPrivate21createEventDispatcherEv+0x2d) [0x7fa44562245d]
/usr/lib/x86_64-linux-gnu/libQt5Core.so.5(_ZN23QCoreApplicationPrivate4initEv+0xb25) [0x7fa44506d795]
/usr/lib/x86_64-linux-gnu/libQt5Gui.so.5(_ZN22QGuiApplicationPrivate4initEv+0x2f) [0x7fa445623eef]
/usr/lib/x86_64-linux-gnu/libQt5Gui.so.5(_ZN15QGuiApplicationC1ERiPPci+0x54) [0x7fa445624c34]

The same command with VGL_DISPLAY set to an X server works. For tests with EGL, I use TurboVNC 2.2.80 as the 2D X server.

peci1 commented 4 years ago

Just to give you a thumbs up, this is 9 simulated cameras offscreen-rendered using the EGL backend in SubT simulator (the window showing the camera outputs is not run on VirtualGL, it uses a custom communication protocol to get the images from the simulator). Great job!

Snímek obrazovky pořízený 2020-08-31 11-31-01

peci1 commented 4 years ago

And one more unclear thing: in https://dvdhrm.wordpress.com/2013/09/01/splitting-drm-and-kms-device-nodes/ they write that the render nodes (/dev/dri/renderD*) are (or can be) independent of a particular GPU. If I got them correctly, there's not necessarily a 1:1 mapping between render nodes and card devices. How does VirtualGL choose the render node then, it the only configuration it receives is the card device? Or is that handled by the nvidia driver?

dcommander commented 4 years ago

That is handled by the driver. VirtualGL accesses the GPU through /dev/dri/card, but the driver uses the appropriate /dev/dri/render node behind the scenes, which is why it needs access to that file.

dcommander commented 4 years ago

@Micket The segfault with glXQueryExtension() should be fixed in the latest build.

@peci1 Oops. That was an oversight. It should be fixed in the latest build.

dcommander commented 4 years ago

Another note:

The initial implementation of the EGL back end supported 2D X servers without GLX extensions, but it did so rather hackishly-- by returning a fake GLX major opcode and error base from XQueryExtension()/glXQueryExtension(). The fake GLX major opcode was 255, and the fake GLX error base was 255 - __GLX_NUMBER_ERRORS + 1. This minimized the odds of stomping on a legitimate 2D X server extension, but it didn't completely eliminate that possibility. The hack didn't fully work (in particular, the GLX error strings were incorrect, so fakerut failed), and it was untenable to implement a similar hack for XCB-GLX, so I decided that it made the most sense, when using the EGL back end, to require a 2D X server GLX extension for the purposes of error handling. Hopefully this is not too onerous of a requirement. X.org (Linux), XQuartz (macOS), and Cygwin/X (Windows) all have GLX extensions, so this requirement shouldn't be an issue with the VGL Transport. With X proxies, it means that the EGL back end can't be used with TightVNC 1.3.x or TurboVNC < 2.2, but as far as I know, all of the other major X proxies have a GLX extension.

peci1 commented 4 years ago

Thanks, the QT5 app now doesn't segfault with EGL. However, it only displays a black window. But another QT5 app works including onscreen rendering, so it is probably a problem with the app and not virtualgl.

dcommander commented 4 years ago

A good data point would be to test the failing app with VGL 2.6.4 and the GLX back end to verify that the issue is unrelated to the EGL back end or other VGL 3.0 features. Even though another Qt5 app works, the issue could still be in VGL.

peci1 commented 4 years ago

The QT5 app run with the very same vgl but with Xorg 3D server runs okay. The only difference I noticed in console output is this:

QSGContext::initialize: depth buffer support missing, expect rendering errors

dcommander commented 4 years ago

OK, that would suggest an EGL back end issue, then. Is this an application I could download and test for myself?

peci1 commented 4 years ago

Yes, you can test the app. It is available on Docker hub (osrf/subt-virtual-testbed:cloudsim_bridge_latest) or you can also install it on Ubuntu Bionic.

Once you get Ignition Blueprint installed, try launching ign gui -v 4. This should show a white-orange-ish window with a few controls. The other QT5 app that works well should also be in the same docker image, and it is launched by source /opt/ros/melodic/setup.bash; rviz.

dcommander commented 4 years ago

OK, so how do I run the application in that Docker container? Attempting to launch the container gives the following error:

Failed to parse /home/developer/subt_ws/install/share/subt_ign/launch/cloudsim_bridge.ign because of missing robotNameX argument

peci1 commented 4 years ago

OK, so how do I run the application in that Docker container?

You can try following the tutorial on https://github.com/osrf/subt/wiki/Docker%20System%20Setup, or you can launch the docker container circumenting its entrypoint and launch directly the command I showed before.

dcommander commented 4 years ago

Side note:

Unfortunately, it is not going to be straightforward to implement GLX_EXT_texture_from_pixmap, primarily because the EGL equivalent (eglBindTexImage()/eglReleaseTexImage()) is designed to work with the default framebuffer and not with a framebuffer object. Barring such a straightforward 1:1 mapping between GLX and EGL, that feature is left as an exercise for future funded development (refer to #134.)

Micket commented 4 years ago

I can successfully use EGL and GLX backends to run glxgears, glxspheres64, ParaView, MATLAB, ABAQUS CAE, Mathematica, VMD, blender

The only thing i spotted, which only happens with EGL mode when running Mathematica was: GL_INVALID_OPERATION:OpenGLGraphics.cpp: 1948 which is printed 8 times the first time i rotate a 3D plot. Doesn't seem to affect the rendering though, everything looks fine inside the application. :+1:

dcommander commented 4 years ago

@Micket Can you post a VGL trace log up to and including the GL_INVALID_OPERATION errors?

Micket commented 4 years ago

All those errors are preceeded by a glBindFrameBuffer on framebuffer=2 it seems, if that means anything

Though again, the application still works fine in this case.

vgl_trace_mathematica.txt

dcommander commented 4 years ago

OK, that is consistent with what I’m seeing in the Qt5 examples. I’m working through those and finding a couple of issues with the EGL back end implementation of glBindFramebuffer(). Should have a new build to test before the end of the day.

dcommander commented 4 years ago

For those who have experienced application-specific problems (particularly with Qt5), please re-test with the latest VGL pre-release build and report back. Thanks. All of the Qt5 OpenGL examples are working except for pbuffer, which renders incorrectly for some unexplained reason, and threadedopenglwidget, which renders only the background. I have spent hours trying to diagnose both issues but have been unable to find the cause. I did, however, fix a handful of other problems that were causing the computegles3, framebufferobject2, and grabber examples to fail.

peci1 commented 4 years ago

Great! The SubT simulator now works for me using EGL. Thank you very much ;)

ffeldhaus commented 4 years ago

@dcommander I tested quite a bit in the last days and one problem I was running into was, that supertuxkart, which I wanted to use for a demo, did not run properly. After your last changes it now runs smooth! I was able to play supertuxkart at nearly Full HD with 60FPS powered by xpra HTML5 cleint and VirtualGL EGL inside a Docker container running in a Google Cloud instance with no noticeable latency nor artifacts.

One issue I have left is, that vglserver_config +egl interferes with the NVIDIA Container Toolkit. To be able to run vglrun -d /dev/dri/card0 ... as a user inside a Docker container, I first need to run vglserver_config +egl in the docker host to ensure correct permissions. If I do so, docker run --gpus 1 ... reports:

nvidia-container-cli: initialization error: nvml error: insufficient permissions

When I remove /etc/modprobe.d/virtualgl.conf and reboot, everything works fine. Can you explain why / if /etc/modprobe.d/virtualgl.conf is required for the EGL backend? If it is not required for EGL, can you omit creating it when running vglserver_config +egl?

Micket commented 4 years ago

Can confirm that I no longer see any error messages when using Mathematica in the latest beta.

dcommander commented 4 years ago

@ffeldhaus /etc/modprobe.d/virtualgl.conf sets the permissions for **/dev/nvidia*** at boot time. I honestly don't know whether that's necessary for the EGL back end or not. Note that, unless you also pass +f to vglserver_config, the devices will be restricted to the vglusers group only, so that may be why nvidia-container-cli is complaining.

dcommander commented 4 years ago

NOTE: VirtualBox also doesn't work with the EGL back end yet (3D applications running in the guest display a black window.) I do notice that there are some named framebuffer functions in OpenGL 4.6 that I will probably have to interpose at some point, since some of those functions are allowed to operate on the default framebuffer (in which case it will be necessary for VGL to redirect the operation to the default FBO.) However, apitrace doesn't reveal that any of the failing applications are using those functions.

dcommander commented 4 years ago

All Qt5 demos are now working, as is VirtualBox, so the only remaining known issues are:

136 (lack of support for OpenGL 4.5 named framebuffer functions that allow the default framebuffer as an argument)
134 (lack of support for certain GLX extensions-- won't be fixed without additional funding)

Closing this issue. Please feel free to add comments confirming that a particular application works with the EGL back end, but if something doesn't work, please open a new issue. Thanks.

muratmaga commented 4 years ago

@ffeldhaus /etc/modprobe.d/virtualgl.conf sets the permissions for **/dev/nvidia*** at boot time. I honestly don't know whether that's necessary for the EGL back end or not. Note that, unless you also pass +f to vglserver_config, the devices will be restricted to the vglusers group only, so that may be why nvidia-container-cli is complaining.

I am suffering from the "nvidia-container-cli: initialization error: nvml error: insufficient permissions", so is the current suggested solution is not to restrict the permission to vglusers group?

VirtualGL / virtualgl

Access the GPU without going through an X server #10

136 (lack of support for OpenGL 4.5 named framebuffer functions that allow the default framebuffer as an argument)

134 (lack of support for certain GLX extensions-- won't be fixed without additional funding)