Open geerlingguy opened 2 years ago
Userspace is mad, crashes at the same spot each time: [ 565.882] (EE) [ 565.882] (EE) Backtrace: [ 565.884] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x188) [0x559488b4c8] [ 565.884] (EE) unw_get_proc_info failed: no unwind info found [-10] [ 565.884] (EE) [ 565.885] (EE) Bus error at address 0x7f95206304 [ 565.885] (EE) Fatal server error: [ 565.885] (EE) Caught signal 7 (Bus error). Server aborting [ 565.885] (EE) [ 565.885] (EE)
That's the same error I'm getting. Does kmscube work?
Yup, I have a glorious floating cube at 60FPS (I don't see a way to disable vsync)
root@quartz64:~# kmscube
Using display 0x55848826b0 with EGL version 1.5
===================================
EGL information:
version: "1.5"
vendor: "Mesa Project"
client extensions: "EGL_EXT_platform_base EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_KHR_client_get_all_proc_addresses EGL_EXT_client_extensions EGL_KHR_debug EGL_KHR_platform_x11 EGL_EXT_platform_x11 EGL_EXT_platform_device EGL_KHR_platform_wayland EGL_EXT_platform_wayland EGL_MESA_platform_xcb EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless"
display extensions: "EGL_ANDROID_blob_cache EGL_EXT_buffer_age EGL_EXT_create_context_robustness EGL_EXT_image_dma_buf_import EGL_EXT_image_dma_buf_import_modifiers EGL_KHR_cl_event2 EGL_KHR_config_attribs EGL_KHR_create_context EGL_KHR_create_context_no_error EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses EGL_KHR_gl_colorspace EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_no_config_context EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_EXT_pixel_format_float EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image EGL_MESA_image_dma_buf_export EGL_MESA_query_driver EGL_WL_bind_wayland_display "
===================================
OpenGL ES 2.x information:
version: "OpenGL ES 3.1 Mesa 21.2.6"
shading language version: "OpenGL ES GLSL ES 3.10"
vendor: "X.Org"
renderer: "AMD TURKS (DRM 2.50.0 / 5.17.0-rc5-00097-gccb1df4cf6b5-dirty, LLVM 12.0.1)"
extensions: "GL_EXT_blend_minmax GL_EXT_multi_draw_arrays GL_EXT_texture_filter_anisotropic GL_EXT_texture_compression_s3tc GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_rgtc GL_EXT_texture_format_BGRA8888 GL_OES_compressed_ETC1_RGB8_texture GL_OES_depth24 GL_OES_element_index_uint GL_OES_fbo_render_mipmap GL_OES_mapbuffer GL_OES_rgb8_rgba8 GL_OES_standard_derivatives GL_OES_stencil8 GL_OES_texture_3D GL_OES_texture_float GL_OES_texture_float_linear GL_OES_texture_half_float GL_OES_texture_half_float_linear GL_OES_texture_npot GL_OES_vertex_half_float GL_EXT_draw_instanced GL_EXT_texture_sRGB_decode GL_OES_EGL_image GL_OES_depth_texture GL_AMD_performance_monitor GL_OES_packed_depth_stencil GL_EXT_texture_type_2_10_10_10_REV GL_NV_conditional_render GL_OES_get_program_binary GL_APPLE_texture_max_level GL_EXT_discard_framebuffer GL_EXT_read_format_bgra GL_EXT_frag_depth GL_NV_fbo_color_attachments GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_vertex_array_object GL_OES_viewport_array GL_ANGLE_pack_reverse_row_order GL_ANGLE_texture_compression_dxt3 GL_ANGLE_texture_compression_dxt5 GL_EXT_occlusion_query_boolean GL_EXT_robustness GL_EXT_texture_rg GL_EXT_unpack_subimage GL_NV_draw_buffers GL_NV_read_buffer GL_NV_read_depth GL_NV_read_depth_stencil GL_NV_read_stencil GL_EXT_draw_buffers GL_EXT_map_buffer_range GL_KHR_debug GL_KHR_robustness GL_KHR_texture_compression_astc_ldr GL_NV_pixel_buffer_object GL_OES_depth_texture_cube_map GL_OES_required_internalformat GL_OES_surfaceless_context GL_EXT_color_buffer_float GL_EXT_sRGB_write_control GL_EXT_separate_shader_objects GL_EXT_shader_implicit_conversions GL_EXT_shader_integer_mix GL_EXT_tessellation_point_size GL_EXT_tessellation_shader GL_EXT_base_instance GL_EXT_compressed_ETC1_RGB8_sub_texture GL_EXT_copy_image GL_EXT_draw_buffers_indexed GL_EXT_draw_elements_base_vertex GL_EXT_gpu_shader5 GL_EXT_polygon_offset_clamp GL_EXT_primitive_bounding_box GL_EXT_render_snorm GL_EXT_shader_io_blocks GL_EXT_texture_border_clamp GL_EXT_texture_buffer GL_EXT_texture_cube_map_array GL_EXT_texture_norm16 GL_EXT_texture_view GL_KHR_context_flush_control GL_KHR_robust_buffer_access_behavior GL_NV_image_formats GL_OES_copy_image GL_OES_draw_buffers_indexed GL_OES_draw_elements_base_vertex GL_OES_gpu_shader5 GL_OES_primitive_bounding_box GL_OES_sample_shading GL_OES_sample_variables GL_OES_shader_io_blocks GL_OES_shader_multisample_interpolation GL_OES_tessellation_point_size GL_OES_tessellation_shader GL_OES_texture_border_clamp GL_OES_texture_buffer GL_OES_texture_cube_map_array GL_OES_texture_stencil8 GL_OES_texture_storage_multisample_2d_array GL_OES_texture_view GL_EXT_blend_func_extended GL_EXT_buffer_storage GL_EXT_float_blend GL_EXT_geometry_point_size GL_EXT_geometry_shader GL_EXT_texture_sRGB_R8 GL_KHR_no_error GL_KHR_texture_compression_astc_sliced_3d GL_OES_EGL_image_external_essl3 GL_OES_geometry_point_size GL_OES_geometry_shader GL_OES_shader_image_atomic GL_EXT_clip_cull_distance GL_EXT_disjoint_timer_query GL_EXT_texture_compression_s3tc_srgb GL_MESA_shader_integer_functions GL_EXT_clip_control GL_EXT_color_buffer_half_float GL_EXT_texture_compression_bptc GL_KHR_parallel_shader_compile GL_EXT_EGL_image_storage GL_MESA_framebuffer_flip_y GL_EXT_depth_clamp GL_EXT_texture_query_lod GL_MESA_bgra "
===================================
Using modifier ffffffffffffff
Modifiers failed!
Using modifier ffffffffffffff
Modifiers failed!
Rendered 120 frames in 2.000056 sec (59.998315 fps)
Rendered 240 frames in 4.000061 sec (59.999082 fps)
That's more than I have on the cm4 then. I'll take a look at the Xorg stuff once I get kmscube to work reliably with more than 1 core enabled.
Userspace is mad, crashes at the same spot each time: [ 565.882] (EE) [ 565.882] (EE) Backtrace: [ 565.884] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x188) [0x559488b4c8] [ 565.884] (EE) unw_get_proc_info failed: no unwind info found [-10] [ 565.884] (EE) [ 565.885] (EE) Bus error at address 0x7f95206304 [ 565.885] (EE) Fatal server error: [ 565.885] (EE) Caught signal 7 (Bus error). Server aborting [ 565.885] (EE) [ 565.885] (EE)
"Bus error" is normally an unaligned write or read. But if kmscube is working, the PCIe bus and the GPU driver is probably OK. Worth trying the command line version of glmark2 also. Were there any dmesg messages when the bus error happens?
On the cm4, there isn't anything useful in dmesg that I noticed. I think Xorg does some things over /dev/mem though (?), so there is probably something in there. I might try again later if I can narrow it down a bit.
glmark2-drm is the non-X command line version. Should work just like kmscube but does more tests.
Just tested with a 4G board, which shouldn't faceplant with a DMA failure. Same error at the same spot.
So the question is what is OsLookupColor+0x188
and what is it doing.
No dmesg output at all, but with drmdebug there's a soft reset recovery on the card at the time of the failure.
Curious:
master@soquartz:~$ sudo glmark2-drm
=======================================================
glmark2 2021.02
=======================================================
OpenGL Information
GL_VENDOR: X.Org
GL_RENDERER: AMD TURKS (DRM 2.50.0 / 5.17.0-rc5-00097-gccb1df4cf6b5-dirty, LLVM 12.0.1)
GL_VERSION: 3.1 Mesa 21.2.2
=======================================================
[build] use-vbo=false: FPS: 60 FrameTime: 16.667 ms
[build] use-vbo=true: FPS: 60 FrameTime: 16.667 ms
[texture] texture-filter=nearest: FPS: 60 FrameTime: 16.667 ms
[texture] texture-filter=linear: FPS: 60 FrameTime: 16.667 ms
[texture] texture-filter=mipmap: FPS: 60 FrameTime: 16.667 ms
[shading] shading=gouraud: FPS: 60 FrameTime: 16.667 ms
[shading] shading=blinn-phong-inf: FPS: 60 FrameTime: 16.667 ms
[shading] shading=phong: FPS: 60 FrameTime: 16.667 ms
[shading] shading=cel: FPS: 60 FrameTime: 16.667 ms
[bump] bump-render=high-poly: FPS: 60 FrameTime: 16.667 ms
[bump] bump-render=normals: FPS: 60 FrameTime: 16.667 ms
[bump] bump-render=height: FPS: 60 FrameTime: 16.667 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 60 FrameTime: 16.667 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 49 FrameTime: 20.408 ms
[pulsar] light=false:quads=5:texture=false: FPS: 60 FrameTime: 16.667 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 30 FrameTime: 33.333 ms
[desktop] effect=shadow:windows=4: FPS: 59 FrameTime: 16.949 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 59 FrameTime: 16.949 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata:Bus error
@pgwipeout So it "bus error" on the "buffer" one. sudo glmark2-drm -b buffer That will just run the "buffer" one. Just to see if it repeatably fails on that one.
So, there are probably some unaligned write/read left to fix in the driver. But this is very good progress. Great work.
root@soquartz:~# glmark2-drm -b buffer
radeon: The kernel rejected CS, see dmesg for more information (-16).
=======================================================
glmark2 2021.02
=======================================================
OpenGL Information
GL_VENDOR: X.Org
GL_RENDERER: AMD TURKS (DRM 2.50.0 / 5.17.0-rc5-00097-gccb1df4cf6b5-dirty, LLVM 12.0.1)
GL_VERSION: 3.1 Mesa 21.2.2
=======================================================
radeon: The kernel rejected CS, see dmesg for more information (-16).
[buffer] <default>:radeon: The kernel rejected CS, see dmesg for more information (-16).
radeon: The kernel rejected CS, see dmesg for more information (-16).
<snip>
radeon: The kernel rejected CS, see dmesg for more information (-16).
FPS: 60 FrameTime: 16.667 ms
=======================================================
glmark2 Score: 60
=======================================================
Nothing in the dmesg, let me turn on drmdebug.
I apologize in advance for this, here's the last output before it dumps (it dumps at the same place every time).
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata:[ 288.489559] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_MMAP
[ 288.491841] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[ 288.492741] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[ 288.493514] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_WAIT_IDLE
[ 288.494320] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_MMAP
[ 288.506527] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[ 288.507336] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[ 288.508096] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_WAIT_IDLE
[ 288.508984] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_MMAP
[ 288.518727] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[ 288.519765] [drm:drm_ioctl] comm="glmark2-d:rcs0" pid=974, dev=0xe201, auth=1, RADEON_CS
[ 288.521283] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_WAIT_IDLE
[ 288.522136] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, DRM_IOCTL_MODE_PAGE_FLIP
[ 288.522970] [drm:evergreen_irq_set [radeon]] dpm thermal
[ 288.523655] [drm:radeon_irq_kms_set_irq_n_enabled [radeon]] vblank0 interrupts enabled
[ 288.524596] radeon 0000:01:00.0: [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0
[ 288.524567] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59648, wptr 59664
[ 288.525512] [drm:radeon_crtc_page_flip_target [radeon]] flip-ioctl() cur_rbo = 0000000011fbf557, new_rbo = 0000000033003173
[ 288.526227] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[ 288.527197] [drm:evergreen_irq_set [radeon]] evergreen_irq_set: sw int gfx
[ 288.527991] [drm:drm_mode_object_get] OBJ ID: 61 (2)
[ 288.528237] [drm:evergreen_irq_set [radeon]] dpm thermal
[ 288.528774] [drm:drm_mode_object_put.part.0] OBJ ID: 61 (3)
[ 288.529779] [drm:drm_mode_object_put.part.0] OBJ ID: 60 (2)
[ 288.530447] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59664, wptr 59680
[ 288.531495] [drm:evergreen_irq_process [radeon]] IH: CP EOP
[ 288.532209] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59680, wptr 59696
[ 288.532269] [drm:evergreen_irq_set [radeon]] dpm thermal
[ 288.533263] [drm:evergreen_irq_process [radeon]] IH: D1 vblank - IH event w/o asserted irq bit?
[ 288.533723] [drm:evergreen_irq_set [radeon]] dpm thermal
[ 288.534439] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[ 288.535507] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59696, wptr 59696
[ 288.547908] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59696, wptr 59712
[ 288.549003] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[ 288.549726] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59712, wptr 59728
[ 288.550749] [drm:evergreen_irq_process [radeon]] IH: D1 flip
[ 288.551472] [drm:evergreen_irq_set [radeon]] dpm thermal
[ 288.552262] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59728, wptr 59728
[ 288.559484] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_CREATE
[ 288.561072] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_MMAP
[ 288.563469] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[ 288.564608] [drm:drm_ioctl] comm="glmark2-d:rcs0" pid=974, dev=0xe201, auth=1, RADEON_CS
[ 288.564584] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59728, wptr 59744
[ 288.566185] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_WAIT_IDLE
[ 288.566243] radeon 0000:01:00.0: [drm:vblank_disable_fn] disabling vblank on crtc 0
[ 288.567036] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, DRM_IOCTL_MODE_PAGE_FLIP
[ 288.567689] [drm:evergreen_irq_set [radeon]] dpm thermal
[ 288.568931] [drm:radeon_irq_kms_set_irq_n_enabled [radeon]] vblank0 interrupts disabled
[ 288.569887] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[ 288.569895] [drm:evergreen_irq_set [radeon]] dpm thermal
[ 288.571089] [drm:radeon_irq_kms_set_irq_n_enabled [radeon]] vblank0 interrupts enabled
[ 288.572039] radeon 0000:01:00.0: [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0
[ 288.572914] [drm:radeon_crtc_page_flip_target [radeon]] flip-ioctl() cur_rbo = 0000000033003173, new_rbo = 0000000011fbf557
[ 288.574193] [drm:evergreen_irq_set [radeon]] dpm thermal
[ 288.575037] [drm:drm_mode_object_get] OBJ ID: 60 (2)
[ 288.575514] [drm:drm_mode_object_put.part.0] OBJ ID: 60 (3)
[ 288.576026] [drm:drm_mode_object_put.part.0] OBJ ID: 61 (2)
[ 288.581241] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59744, wptr 59760
[ 288.582335] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[ 288.583053] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59760, wptr 59776
[ 288.584076] [drm:evergreen_irq_process [radeon]] IH: D1 flip
[ 288.584803] [drm:evergreen_irq_set [radeon]] dpm thermal
[ 288.585571] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59776, wptr 59776
[ 288.597907] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59776, wptr 59792
[ 288.599025] radeon 0000:01:00.0: [drm:vblank_disable_fn] disabling vblank on crtc 0
[ 288.599748] [drm:evergreen_irq_set [radeon]] dpm thermal
[ 288.600428] [drm:radeon_irq_kms_set_irq_n_enabled [radeon]] vblank0 interrupts disabled
[ 288.601369] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[ 290.161566] [drm:drm_release] open_count = 1
[ 290.162005] [drm:drm_file_free.part.0] comm="glmark2-drm", pid=973, dev=0xe201, open_count=1
[ 290.162767] [drm:drm_mode_object_put.part.0] OBJ ID: 61 (1)
[ 290.163413] radeon 0000:01:00.0: [drm:drm_mode_rmfb_work_fn] Removing [FB:60] from all active usage due to RMFB ioctl
[ 290.164428] radeon 0000:01:00.0: [drm:drm_framebuffer_remove] Disabling [CRTC:42:crtc-0] because [FB:60] is removed
[ 290.165524] [drm:drm_crtc_helper_set_config]
[ 290.165934] [drm:drm_crtc_helper_set_config] [CRTC:42:crtc-0] [NOFB]
[ 290.166510] [drm:drm_mode_object_put.part.0] OBJ ID: 56 (4)
[ 290.167032] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[ 290.168305] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 30 to mode 3, devices 00000080, active_devices 00000080
[ 290.169623] [drm:evergreen_hdmi_enable [radeon]] Disabling HDMI interface @ 0x0000 for encoder 0x1e
[ 290.171080] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[ 290.198351] [drm:rv740_get_dll_speed [radeon]] Target MCLK greater than largest MCLK in DLL speed table
[ 290.199450] [drm:rv740_get_dll_speed [radeon]] Target MCLK greater than largest MCLK in DLL speed table
[ 290.202985] [drm:drm_mode_object_put.part.0] OBJ ID: 60 (2)
[ 290.203650] [drm:drm_mode_object_put.part.0] OBJ ID: 60 (1)
[ 290.205263] [drm:drm_release]
[ 290.205590] [drm:drm_crtc_helper_set_config]
[ 290.205988] [drm:drm_crtc_helper_set_config] [CRTC:42:crtc-0] [FB:58] #connectors=1 (x y) (0 0)
[ 290.206764] [drm:drm_crtc_helper_set_config] crtc has no fb, full mode set
[ 290.207373] [drm:drm_mode_object_get] OBJ ID: 56 (2)
[ 290.207823] [drm:drm_crtc_helper_set_config] connector dpms not on, full mode switch
[ 290.208590] [drm:drm_crtc_helper_set_config] encoder changed, full mode switch
[ 290.209241] [drm:drm_crtc_helper_set_config] crtc changed, full mode switch
[ 290.209864] [drm:drm_crtc_helper_set_config] [CONNECTOR:56:DVI-I-1] to [CRTC:42:crtc-0]
[ 290.210682] [drm:drm_crtc_helper_set_config] attempting to set mode from userspace
[ 290.211362] [drm:drm_mode_debug_printmodeline] Modeline "1920x1080": 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x48 0x5
[ 290.212443] [drm:radeon_encoder_set_active_device [radeon]] setting active device to 00000080 from 00000080 00000081 for encoder 2
[ 290.213680] [drm:radeon_get_monitor_bpc [radeon]] DVI-I-1: hdmi mode dotclock 148500 kHz, max tmds input clock 225000 kHz.
[ 290.214793] [drm:radeon_get_monitor_bpc [radeon]] DVI-I-1: Display bpc=8, returned bpc=8
[ 290.215656] [drm:drm_crtc_helper_set_mode] [CRTC:42:crtc-0]
[ 290.216268] [drm:radeon_get_monitor_bpc [radeon]] DVI-I-1: hdmi mode dotclock 148500 kHz, max tmds input clock 225000 kHz.
[ 290.217499] [drm:radeon_get_monitor_bpc [radeon]] DVI-I-1: Display bpc=8, returned bpc=8
[ 290.218363] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[ 290.219454] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[ 290.221132] [drm:rv740_get_dll_speed [radeon]] Target MCLK greater than largest MCLK in DLL speed table
[ 290.222145] [drm:rv740_get_dll_speed [radeon]] Target MCLK greater than largest MCLK in DLL speed table
[ 290.224450] [drm:radeon_compute_pll_avivo [radeon]] 148500 - 148500, pll dividers - fb: 88.0 ref: 2, post 8
[ 290.236560] [drm:drm_crtc_helper_set_mode] [ENCODER:55:TMDS-55] set [MODE:1920x1080]
[ 290.236457] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59792, wptr 59808
[ 290.237317] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 30 to mode 3, devices 00000080, active_devices 00000080
[ 290.238154] [drm:evergreen_irq_process [radeon]] IH: D1 flip
[ 290.239062] [drm:evergreen_hdmi_enable [radeon]] Disabling HDMI interface @ 0x0000 for encoder 0x1e
[ 290.239559] [drm:radeon_crtc_handle_flip [radeon]] radeon_crtc->flip_status = 0 != RADEON_FLIP_SUBMITTED(2)
[ 290.240897] [drm:drm_detect_monitor_audio] Monitor has basic audio support
[ 290.242034] [drm:dce4_hdmi_set_color_depth [radeon]] DVI-I-1: Disabling hdmi deep color for 8 bpc.
[ 290.243252] [drm:dce5_crtc_load_lut [radeon]] 0
[ 290.276581] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 30 to mode 0, devices 00000080, active_devices 00000080
[ 290.277695] [drm:drm_detect_monitor_audio] Monitor has basic audio support
[ 290.278314] [drm:evergreen_hdmi_enable [radeon]] Enabling HDMI interface @ 0x0000 for encoder 0x1e
[ 290.280548] radeon 0000:01:00.0: [drm:drm_calc_timestamping_constants] crtc 42: hwmode: htotal 2200, vtotal 1125, vdisplay 1080
[ 290.281711] radeon 0000:01:00.0: [drm:drm_calc_timestamping_constants] crtc 42: clock 148500 kHz framedur 16666666 linedur 14814
[ 290.282746] [drm:drm_crtc_helper_set_config] Setting connector DPMS state to on
[ 290.283399] [drm:drm_crtc_helper_set_config] [CONNECTOR:56:DVI-I-1] set DPMS on
[ 290.284196] [drm:dce5_crtc_load_lut [radeon]] 0
[ 290.304330] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 30 to mode 0, devices 00000080, active_devices 00000080
[ 290.305601] [drm:drm_detect_monitor_audio] Monitor has basic audio support
[ 290.306238] [drm:evergreen_hdmi_enable [radeon]] Enabling HDMI interface @ 0x0000 for encoder 0x1e
[ 290.307827] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[ 290.309520] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[ 290.311227] [drm:drm_mode_object_get] OBJ ID: 58 (1)
[ 290.311705] [drm:drm_crtc_helper_set_config]
[ 290.312223] [drm:drm_crtc_helper_set_config] [CRTC:44:crtc-1] [NOFB]
[ 290.312822] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[ 290.314201] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[ 290.315899] [drm:drm_mode_object_get] OBJ ID: 58 (2)
[ 290.316461] [drm:drm_mode_object_put.part.0] OBJ ID: 58 (3)
[ 290.316988] [drm:drm_crtc_helper_set_config]
[ 290.317529] [drm:drm_crtc_helper_set_config] [CRTC:46:crtc-2] [NOFB]
[ 290.318131] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[ 290.319313] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[ 290.321138] [drm:drm_mode_object_get] OBJ ID: 58 (2)
[ 290.321611] [drm:drm_mode_object_put.part.0] OBJ ID: 58 (3)
[ 290.322117] [drm:drm_crtc_helper_set_config]
[ 290.322511] [drm:drm_crtc_helper_set_config] [CRTC:48:crtc-3] [NOFB]
[ 290.323086] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[ 290.324281] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[ 290.325890] [drm:drm_mode_object_get] OBJ ID: 58 (2)
[ 290.326358] [drm:drm_mode_object_put.part.0] OBJ ID: 58 (3)
[ 290.326863] [drm:drm_crtc_helper_set_config]
[ 290.327257] [drm:drm_crtc_helper_set_config] [CRTC:50:crtc-4] [NOFB]
[ 290.327831] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[ 290.329011] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[ 290.330829] [drm:drm_mode_object_get] OBJ ID: 58 (2)
[ 290.331298] [drm:drm_mode_object_put.part.0] OBJ ID: 58 (3)
[ 290.331806] [drm:drm_crtc_helper_set_config]
[ 290.332275] [drm:drm_crtc_helper_set_config] [CRTC:52:crtc-5] [NOFB]
[ 290.332862] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[ 290.333980] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[ 290.335576] [drm:drm_mode_object_get] OBJ ID: 58 (2)
[ 290.336039] [drm:drm_mode_object_put.part.0] OBJ ID: 58 (3)
[ 290.336612] [drm:drm_release] driver lastclose completed
Bus error (core dumped)
@pgwipeout At least the problem is predictable. :-) You will probably need to compile glmark2 in debug mode, then run it under gdb to find out which line the bus error is on. Hopefully it will narrow it down to a syscall to the driver, and then we will know which function in the driver code has the alignment problem. Alternatively, it might be worth trying (as root) strace -f /usr/bin/glmark2-drm -b buffer
The buffer benchmark by default uses "map" as the vbo update method. I haven't looked very closely into glmark2s code, but my guess is that it maps the buffer on the GPU into userspace and accesses is that way, while not making sure alignment is correct. I don't know if there is a way to enforce alignment on mapped memory from the kernel though, or if all userspace software would have to be changed.
Try running glmark2 -b buffer:update-method=subdata
instead
If it helps, one can force alignment checking on x86 and even get the compiler to check for it with -Wcast-align You add an extra line of inline asm code to switch it on around the section you wish to check. An example page explaining it: https://www.xszz.org/faq-41/question-20190628214421.html
I just tried the buffer scene on the cm4, and it runs successfully without changing any parameters. I can't run the full benchmark though, as the pi can lock up completely after a while running 3D stuff (though it usually gets through one scene)
@pgwipeout Please can you post the output of lspci -vv There might be a BAR alignment problem. For GPUs, the BAR has to be aligned on a BAR size boundary. So if its a 256MB BAR size, it has to be on a 256MB boundary.
On the CM4, it also crashes with Bus error on the buffer test when running all benchmarks, but runs buffer without issue when it's the only one being run.
@jcdutton You crazy person, you've done it. Yeah our alignment was off because of a previous bug with SSDs. Now I've got the gpu working, but I've gotta test if there's regressions there.
@pgwipeout Just to clarify, was it the BAR alignment?
Aye, it was the bar alignment. Still some other quirks going on here, but I got an X session to start.
Celebration was premature. The Xsession is using the video output, but it's llvmpipe.
Still crash at [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata:Bus error (core dumped)
unfortunately though.
Wayland is also still unhappy.
It's smart enough to align the buffer even if the window isn't aligned. I switched back to the original state:
lspci -vv
00:00.0 PCI bridge: Rockchip Electronics Co., Ltd RK3568 Remote Signal Processor (rev 01) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 47
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: 00001000-00001fff [size=4K]
Memory behind bridge: 02000000-020fffff [size=1M]
Prefetchable memory behind bridge: 0000000010000000-000000001fffffff [size=256M]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
Expansion ROM at 302100000 [virtual] [disabled] [size=64K]
BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/32 Maskable- 64bit+
Address: 00000000fd450040 Data: 0000
Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (ok), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
RootCap: CRSVisible+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP+ LTR+
10BitTagComp- 10BitTagReq- OBFF Via message/WAKE#, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd-
AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
Vector table: BAR=0 offset=00000000
PBA: BAR=0 offset=00000010
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
RootCmd: CERptEn- NFERptEn- FERptEn-
RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
FirstFatal- NonFatalMsg- FatalMsg- IntMsg 9
ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
Capabilities: [148 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [160 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=10us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [170 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Kernel driver in use: pcieport
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Turks PRO [Radeon HD 7570] (prog-if 00 [VGA controller])
Subsystem: Dell Turks PRO [Radeon HD 7570]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 79
Region 0: Memory at 310000000 (64-bit, prefetchable) [size=256M]
Region 2: Memory at 302000000 (64-bit, non-prefetchable) [size=128K]
Region 4: I/O ports at 1000 [size=256]
Expansion ROM at 302020000 [disabled] [size=128K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #4, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (ok), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fd450040 Data: 0000
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Kernel driver in use: radeon
Kernel modules: radeon
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Turks HDMI Audio [Radeon HD 6500/6600 / 6700M Series]
Subsystem: Dell Turks HDMI Audio [Radeon HD 6500/6600 / 6700M Series]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin B routed to IRQ 53
Region 0: Memory at 302040000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #4, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (ok), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fd450040 Data: 0000
Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Kernel driver in use: snd_hda_intel
FYI, glmark2 does 60fps on the Radeon card, even on x86. So, it only doing 60fps is not an ARM specific bug.
Yeah, kms enforces vsync, so your fps will be exactly your refresh rate. strace has been helpful, it doesn't look like a misaligned write:
futex(0x5598fbf4e8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x5598fbf490, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55990b9d40, FUTEX_WAIT_BITSET, 2, NULL, FUTEX_BITSET_MATCH_ANY) = 0
ioctl(5, DRM_IOCTL_RADEON_GEM_WAIT_IDLE, 0x7fe8e0fdd8) = 0
ioctl(3, DRM_IOCTL_MODE_PAGE_FLIP, 0x7fe8e0fe20) = 0
pselect6(4, [3], NULL, NULL, NULL, NULL) = 1 (in [3])
read(3, "\2\0\0\0 \0\0\0\214\376\340\350\177\0\0\0+\5\0\0\211\n\10\0\2204\1\0*\0\0\0", 1024) = 32
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0x7f7ba7e2ac} ---
+++ killed by SIGBUS (core dumped) +++
Bus error (core dumped)
It is a misaligned read or write. It even states the address: si_addr=0x7f7ba7e2ac I think the next step to diagnose this is to build glmark2 from sources, in debug mode and -O0. Then run it under gdb. You could try running glmark2-drm in gdb (even without debug mode). It might give you some information when it SIGBUS. The 0x7f7ba7e2ac is a virtual address. Most likely allocated in a mmap call to the driver. If the driver is responding to the mmap call, and allocating something in physical RAM > 4GB, it might be the cause of your problem.
Thread 1 "glmark2-drm" received signal SIGBUS, Bus error.
__memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
173 ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory.
(gdb) where
#0 __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
#1 0x0000007ff696b42c in ?? () from /usr/lib/aarch64-linux-gnu/dri/r600_dri.so
#2 0x00000055555e83fc in Mesh::update_single_vbo (this=0x5555d47400,
ranges=std::vector of length 10, capacity 16 = {...}, n=0, nfloats=3) at ../src/mesh.cpp:472
#3 0x00000055555e8508 in Mesh::update_vbo (this=0x5555d47400, ranges=std::vector of length 10, capacity 16 = {...})
at ../src/mesh.cpp:499
#4 0x000000555557fba4 in WaveMesh::update (this=0x5555d47400, elapsed=0.075626000000056592)
at ../src/scene-buffer.cpp:163
#5 0x000000555557eef8 in SceneBuffer::update (this=0x5555689400) at ../src/scene-buffer.cpp:434
#6 0x000000555557b308 in MainLoop::draw (this=0x55557b5350) at ../src/main-loop.cpp:134
#7 0x000000555557b18c in MainLoop::step (this=0x55557b5350) at ../src/main-loop.cpp:108
#8 0x000000555555f7b8 in do_benchmark (canvas=...) at ../src/main.cpp:123
#9 0x000000555555fc5c in main (argc=1, argv=0x7ffffff668) at ../src/main.cpp:226
Hmmmm::
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata:
Thread 1 "glmark2-drm" received signal SIGBUS, Bus error.
__memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
173 ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory.
(gdb) where
#0 __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
#1 0x0000007ff6b845c0 in r600_buffer_subdata (ctx=0x555619d070, buffer=0x5555cd68b0, usage=10, offset=1452,
size=15828, data=0x55558a5e7c) at ../src/gallium/drivers/r600/r600_buffer_common.c:570
#2 0x0000007ff60289c4 in _mesa_bufferobj_subdata (ctx=0x5555863e20, offset=1452, size=15828, data=0x55558a5e7c,
obj=0x5555cd67b0) at ../src/mesa/main/bufferobj.c:115
#3 0x0000007ff602fa40 in _mesa_buffer_sub_data (ctx=0x5555863e20, bufObj=0x5555cd67b0, offset=1452, size=15828,
data=0x55558a5e7c) at ../src/mesa/main/bufferobj.c:2633
#4 0x0000007ff602fd08 in buffer_sub_data (func=0x7ff70d5638 "glBufferSubData", no_error=false, dsa=false,
data=0x55558a5e7c, size=15828, offset=1452, buffer=0, target=34962) at ../src/mesa/main/bufferobj.c:2665
#5 _mesa_BufferSubData (target=34962, offset=1452, size=15828, data=0x55558a5e7c) at ../src/mesa/main/bufferobj.c:2682
#6 0x00000055555e83fc in Mesh::update_single_vbo (this=0x55557d4570,
ranges=std::vector of length 10, capacity 16 = {...}, n=0, nfloats=3) at ../src/mesh.cpp:472
#7 0x00000055555e8508 in Mesh::update_vbo (this=0x55557d4570, ranges=std::vector of length 10, capacity 16 = {...})
at ../src/mesh.cpp:499
#8 0x000000555557fba4 in WaveMesh::update (this=0x55557d4570, elapsed=0.089772999999922831)
at ../src/scene-buffer.cpp:163
#9 0x000000555557eef8 in SceneBuffer::update (this=0x5555689400) at ../src/scene-buffer.cpp:434
#10 0x000000555557b308 in MainLoop::draw (this=0x55557e19c0) at ../src/main-loop.cpp:134
#11 0x000000555557b18c in MainLoop::step (this=0x55557e19c0) at ../src/main-loop.cpp:108
#12 0x000000555555f7b8 in do_benchmark (canvas=...) at ../src/main.cpp:123
#13 0x000000555555fc5c in main (argc=1, argv=0x7ffffff688) at ../src/main.cpp:226
subdata also just maps the buffer and copies the data with memcpy, so I guess that won't work. What might work though is to use LD_PRELOAD to load a library that overwrites __memcpy_generic with a version that only copies a single byte at a time (for now, as that shouldn't have any alignment requirements)
Well, this is entirely weird. I set a breakpoint at that memcpy and stepped through the program and it didn't crash.
Triggered with a breakpoint.
Thread 1 "glmark2-drm" hit Breakpoint 5, r600_buffer_subdata (ctx=0x55558af180, buffer=0x5556254c30, usage=10, offset=0, size=7200, data=0x55560fcc70) at ../src/gallium/drivers/r600/r600_buffer_common.c:570
570 memcpy(map, data, size);
(gdb)
$247 = (uint8_t *) 0x7feca58a80 <error: Cannot access memory at address 0x7feca58a80>
Thread 1 "glmark2-drm" hit Breakpoint 5, r600_buffer_subdata (ctx=0x55558af180, buffer=0x5556254c30, usage=10, offset=10092, size=25908, data=0x55560ff3dc) at ../src/gallium/drivers/r600/r600_buffer_common.c:570
570 memcpy(map, data, size);
(gdb)
$248 = (uint8_t *) 0x7feca5a6ec <error: Cannot access memory at address 0x7feca5a6ec>
Thread 1 "glmark2-drm" received signal SIGBUS, Bus error.
I've created an issue on mesa in regards to this bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6142
@pgwipeout We need to determine whether the unaligned fault is due to a read or a write. most memcpy implementations are 8 byte or 16 bytes aligned when writing to the dest. They do not tend to be aligned on read from the src. In this case: map is the dest data is the src. So, when running in gdb, and you hit the bus error. type "disassemble" and post the output here.
Thread 1 "glmark2-drm" received signal SIGBUS, Bus error.
__memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
173 ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory.
(gdb) disassemble
Dump of assembler code for function __memcpy_generic:
0x0000007ff7aaea80 <+0>: nop
0x0000007ff7aaea84 <+4>: add x4, x1, x2
0x0000007ff7aaea88 <+8>: add x5, x0, x2
0x0000007ff7aaea8c <+12>: cmp x2, #0x80
0x0000007ff7aaea90 <+16>: b.hi 0x7ff7aaeb80 <__memcpy_generic+256> // b.pmore
0x0000007ff7aaea94 <+20>: cmp x2, #0x20
0x0000007ff7aaea98 <+24>: b.hi 0x7ff7aaeb10 <__memcpy_generic+144> // b.pmore
0x0000007ff7aaea9c <+28>: cmp x2, #0x10
0x0000007ff7aaeaa0 <+32>: b.cc 0x7ff7aaeab8 <__memcpy_generic+56> // b.lo, b.ul, b.last
0x0000007ff7aaeaa4 <+36>: ldp x6, x7, [x1]
0x0000007ff7aaeaa8 <+40>: ldp x12, x13, [x4, #-16]
0x0000007ff7aaeaac <+44>: stp x6, x7, [x0]
0x0000007ff7aaeab0 <+48>: stp x12, x13, [x5, #-16]
0x0000007ff7aaeab4 <+52>: ret
0x0000007ff7aaeab8 <+56>: tbz w2, #3, 0x7ff7aaead0 <__memcpy_generic+80>
0x0000007ff7aaeabc <+60>: ldr x6, [x1]
0x0000007ff7aaeac0 <+64>: ldur x7, [x4, #-8]
0x0000007ff7aaeac4 <+68>: str x6, [x0]
0x0000007ff7aaeac8 <+72>: stur x7, [x5, #-8]
0x0000007ff7aaeacc <+76>: ret
0x0000007ff7aaead0 <+80>: tbz w2, #2, 0x7ff7aaeae8 <__memcpy_generic+104>
0x0000007ff7aaead4 <+84>: ldr w6, [x1]
0x0000007ff7aaead8 <+88>: ldur w8, [x4, #-4]
0x0000007ff7aaeadc <+92>: str w6, [x0]
0x0000007ff7aaeae0 <+96>: stur w8, [x5, #-4]
0x0000007ff7aaeae4 <+100>: ret
0x0000007ff7aaeae8 <+104>: cbz x2, 0x7ff7aaeb08 <__memcpy_generic+136>
0x0000007ff7aaeaec <+108>: lsr x14, x2, #1
0x0000007ff7aaeaf0 <+112>: ldrb w6, [x1]
0x0000007ff7aaeaf4 <+116>: ldurb w10, [x4, #-1]
0x0000007ff7aaeaf8 <+120>: ldrb w8, [x1, x14]
0x0000007ff7aaeafc <+124>: strb w6, [x0]
0x0000007ff7aaeb00 <+128>: strb w8, [x0, x14]
0x0000007ff7aaeb04 <+132>: sturb w10, [x5, #-1]
0x0000007ff7aaeb08 <+136>: ret
0x0000007ff7aaeb0c <+140>: nop
0x0000007ff7aaeb10 <+144>: ldp x6, x7, [x1]
0x0000007ff7aaeb14 <+148>: ldp x8, x9, [x1, #16]
0x0000007ff7aaeb18 <+152>: ldp x10, x11, [x4, #-32]
0x0000007ff7aaeb1c <+156>: ldp x12, x13, [x4, #-16]
0x0000007ff7aaeb20 <+160>: cmp x2, #0x40
0x0000007ff7aaeb24 <+164>: b.hi 0x7ff7aaeb40 <__memcpy_generic+192> // b.pmore
0x0000007ff7aaeb28 <+168>: stp x6, x7, [x0]
0x0000007ff7aaeb2c <+172>: stp x8, x9, [x0, #16]
0x0000007ff7aaeb30 <+176>: stp x10, x11, [x5, #-32]
0x0000007ff7aaeb34 <+180>: stp x12, x13, [x5, #-16]
0x0000007ff7aaeb38 <+184>: ret
0x0000007ff7aaeb3c <+188>: nop
0x0000007ff7aaeb40 <+192>: ldp x14, x15, [x1, #32]
0x0000007ff7aaeb44 <+196>: ldp x16, x17, [x1, #48]
0x0000007ff7aaeb48 <+200>: cmp x2, #0x60
0x0000007ff7aaeb4c <+204>: b.ls 0x7ff7aaeb60 <__memcpy_generic+224> // b.plast
0x0000007ff7aaeb50 <+208>: ldp x2, x3, [x4, #-64]
0x0000007ff7aaeb54 <+212>: ldp x1, x4, [x4, #-48]
0x0000007ff7aaeb58 <+216>: stp x2, x3, [x5, #-64]
0x0000007ff7aaeb5c <+220>: stp x1, x4, [x5, #-48]
0x0000007ff7aaeb60 <+224>: stp x6, x7, [x0]
0x0000007ff7aaeb64 <+228>: stp x8, x9, [x0, #16]
0x0000007ff7aaeb68 <+232>: stp x14, x15, [x0, #32]
0x0000007ff7aaeb6c <+236>: stp x16, x17, [x0, #48]
0x0000007ff7aaeb70 <+240>: stp x10, x11, [x5, #-32]
--Type <RET> for more, q to quit, c to continue without paging--c
0x0000007ff7aaeb74 <+244>: stp x12, x13, [x5, #-16]
0x0000007ff7aaeb78 <+248>: ret
0x0000007ff7aaeb7c <+252>: nop
0x0000007ff7aaeb80 <+256>: ldp x12, x13, [x1]
0x0000007ff7aaeb84 <+260>: and x14, x0, #0xf
0x0000007ff7aaeb88 <+264>: and x3, x0, #0xfffffffffffffff0
0x0000007ff7aaeb8c <+268>: sub x1, x1, x14
0x0000007ff7aaeb90 <+272>: add x2, x2, x14
0x0000007ff7aaeb94 <+276>: ldp x6, x7, [x1, #16]
=> 0x0000007ff7aaeb98 <+280>: stp x12, x13, [x0]
0x0000007ff7aaeb9c <+284>: ldp x8, x9, [x1, #32]
0x0000007ff7aaeba0 <+288>: ldp x10, x11, [x1, #48]
0x0000007ff7aaeba4 <+292>: ldp x12, x13, [x1, #64]!
0x0000007ff7aaeba8 <+296>: subs x2, x2, #0x90
0x0000007ff7aaebac <+300>: b.ls 0x7ff7aaebd8 <__memcpy_generic+344> // b.plast
0x0000007ff7aaebb0 <+304>: stp x6, x7, [x3, #16]
0x0000007ff7aaebb4 <+308>: ldp x6, x7, [x1, #16]
0x0000007ff7aaebb8 <+312>: stp x8, x9, [x3, #32]
0x0000007ff7aaebbc <+316>: ldp x8, x9, [x1, #32]
0x0000007ff7aaebc0 <+320>: stp x10, x11, [x3, #48]
0x0000007ff7aaebc4 <+324>: ldp x10, x11, [x1, #48]
0x0000007ff7aaebc8 <+328>: stp x12, x13, [x3, #64]!
0x0000007ff7aaebcc <+332>: ldp x12, x13, [x1, #64]!
0x0000007ff7aaebd0 <+336>: subs x2, x2, #0x40
0x0000007ff7aaebd4 <+340>: b.hi 0x7ff7aaebb0 <__memcpy_generic+304> // b.pmore
0x0000007ff7aaebd8 <+344>: ldp x14, x15, [x4, #-64]
0x0000007ff7aaebdc <+348>: stp x6, x7, [x3, #16]
0x0000007ff7aaebe0 <+352>: ldp x6, x7, [x4, #-48]
0x0000007ff7aaebe4 <+356>: stp x8, x9, [x3, #32]
0x0000007ff7aaebe8 <+360>: ldp x8, x9, [x4, #-32]
0x0000007ff7aaebec <+364>: stp x10, x11, [x3, #48]
0x0000007ff7aaebf0 <+368>: ldp x10, x11, [x4, #-16]
0x0000007ff7aaebf4 <+372>: stp x12, x13, [x3, #64]
0x0000007ff7aaebf8 <+376>: stp x14, x15, [x5, #-64]
0x0000007ff7aaebfc <+380>: stp x6, x7, [x5, #-48]
0x0000007ff7aaec00 <+384>: stp x8, x9, [x5, #-32]
0x0000007ff7aaec04 <+388>: stp x10, x11, [x5, #-16]
0x0000007ff7aaec08 <+392>: ret
End of assembler dump.
The output of the disassemble command show the next instruction it is about to execute: 0x0000007ff7aaeb8c <+268>: sub x1, x1, x14 0x0000007ff7aaeb90 <+272>: add x2, x2, x14 0x0000007ff7aaeb94 <+276>: ldp x6, x7, [x1, #16] => 0x0000007ff7aaeb98 <+280>: stp x12, x13, [x0] 0x0000007ff7aaeb9c <+284>: ldp x8, x9, [x1, #32] 0x0000007ff7aaeba0 <+288>: ldp x10, x11, [x1, #48] 0x0000007ff7aaeba4 <+292>: ldp x12, x13, [x1, #64]!
So, this means that the "bus error" happened at a load instruction. Pretty much all the instructions in memcpy copy 8 bytes at a time, in pairs. so ldp x6,x7 [x1, 16] loads 8 bytes into x6, then 8 bytes into x7. I.e. a pair of 8 byte numbers. So, none of them are 32bit operations. Please post the output of x1 and x0 at this point. i r x0 i r x1 The value of x1 should be within 16 (0x10) bytes of the bus error address.
The fix is probably to find out why the source, (data) needs to be aligned. One can also not simply replace memcpy with a byte at a time copy, because the compiler will just optimize it up to 8 byte transfers.
Thread 1 "glmark2-drm" received signal SIGBUS, Bus error.
__memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
173 ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory.
(gdb) i r x0
x0 0x7fecda8cac 549434592428
(gdb) i r x1
x1 0x5556268180 366517584256
(gdb)
I just realized the context got blown out cause I lost power for a minute. So here's the new context.
(gdb) disassemble
Dump of assembler code for function __memcpy_generic:
0x0000007ff7aaea80 <+0>: nop
0x0000007ff7aaea84 <+4>: add x4, x1, x2
0x0000007ff7aaea88 <+8>: add x5, x0, x2
0x0000007ff7aaea8c <+12>: cmp x2, #0x80
0x0000007ff7aaea90 <+16>: b.hi 0x7ff7aaeb80 <__memcpy_generic+256> // b.pmore
0x0000007ff7aaea94 <+20>: cmp x2, #0x20
0x0000007ff7aaea98 <+24>: b.hi 0x7ff7aaeb10 <__memcpy_generic+144> // b.pmore
0x0000007ff7aaea9c <+28>: cmp x2, #0x10
0x0000007ff7aaeaa0 <+32>: b.cc 0x7ff7aaeab8 <__memcpy_generic+56> // b.lo, b.ul, b.last
0x0000007ff7aaeaa4 <+36>: ldp x6, x7, [x1]
0x0000007ff7aaeaa8 <+40>: ldp x12, x13, [x4, #-16]
0x0000007ff7aaeaac <+44>: stp x6, x7, [x0]
0x0000007ff7aaeab0 <+48>: stp x12, x13, [x5, #-16]
0x0000007ff7aaeab4 <+52>: ret
0x0000007ff7aaeab8 <+56>: tbz w2, #3, 0x7ff7aaead0 <__memcpy_generic+80>
0x0000007ff7aaeabc <+60>: ldr x6, [x1]
0x0000007ff7aaeac0 <+64>: ldur x7, [x4, #-8]
0x0000007ff7aaeac4 <+68>: str x6, [x0]
0x0000007ff7aaeac8 <+72>: stur x7, [x5, #-8]
0x0000007ff7aaeacc <+76>: ret
0x0000007ff7aaead0 <+80>: tbz w2, #2, 0x7ff7aaeae8 <__memcpy_generic+104>
0x0000007ff7aaead4 <+84>: ldr w6, [x1]
0x0000007ff7aaead8 <+88>: ldur w8, [x4, #-4]
0x0000007ff7aaeadc <+92>: str w6, [x0]
0x0000007ff7aaeae0 <+96>: stur w8, [x5, #-4]
0x0000007ff7aaeae4 <+100>: ret
0x0000007ff7aaeae8 <+104>: cbz x2, 0x7ff7aaeb08 <__memcpy_generic+136>
0x0000007ff7aaeaec <+108>: lsr x14, x2, #1
0x0000007ff7aaeaf0 <+112>: ldrb w6, [x1]
0x0000007ff7aaeaf4 <+116>: ldurb w10, [x4, #-1]
0x0000007ff7aaeaf8 <+120>: ldrb w8, [x1, x14]
0x0000007ff7aaeafc <+124>: strb w6, [x0]
0x0000007ff7aaeb00 <+128>: strb w8, [x0, x14]
0x0000007ff7aaeb04 <+132>: sturb w10, [x5, #-1]
0x0000007ff7aaeb08 <+136>: ret
0x0000007ff7aaeb0c <+140>: nop
0x0000007ff7aaeb10 <+144>: ldp x6, x7, [x1]
0x0000007ff7aaeb14 <+148>: ldp x8, x9, [x1, #16]
0x0000007ff7aaeb18 <+152>: ldp x10, x11, [x4, #-32]
0x0000007ff7aaeb1c <+156>: ldp x12, x13, [x4, #-16]
0x0000007ff7aaeb20 <+160>: cmp x2, #0x40
0x0000007ff7aaeb24 <+164>: b.hi 0x7ff7aaeb40 <__memcpy_generic+192> // b.pmore
0x0000007ff7aaeb28 <+168>: stp x6, x7, [x0]
0x0000007ff7aaeb2c <+172>: stp x8, x9, [x0, #16]
0x0000007ff7aaeb30 <+176>: stp x10, x11, [x5, #-32]
0x0000007ff7aaeb34 <+180>: stp x12, x13, [x5, #-16]
0x0000007ff7aaeb38 <+184>: ret
0x0000007ff7aaeb3c <+188>: nop
0x0000007ff7aaeb40 <+192>: ldp x14, x15, [x1, #32]
0x0000007ff7aaeb44 <+196>: ldp x16, x17, [x1, #48]
0x0000007ff7aaeb48 <+200>: cmp x2, #0x60
0x0000007ff7aaeb4c <+204>: b.ls 0x7ff7aaeb60 <__memcpy_generic+224> // b.plast
0x0000007ff7aaeb50 <+208>: ldp x2, x3, [x4, #-64]
0x0000007ff7aaeb54 <+212>: ldp x1, x4, [x4, #-48]
0x0000007ff7aaeb58 <+216>: stp x2, x3, [x5, #-64]
0x0000007ff7aaeb5c <+220>: stp x1, x4, [x5, #-48]
0x0000007ff7aaeb60 <+224>: stp x6, x7, [x0]
0x0000007ff7aaeb64 <+228>: stp x8, x9, [x0, #16]
0x0000007ff7aaeb68 <+232>: stp x14, x15, [x0, #32]
0x0000007ff7aaeb6c <+236>: stp x16, x17, [x0, #48]
0x0000007ff7aaeb70 <+240>: stp x10, x11, [x5, #-32]
--Type <RET> for more, q to quit, c to continue without paging--c
0x0000007ff7aaeb74 <+244>: stp x12, x13, [x5, #-16]
0x0000007ff7aaeb78 <+248>: ret
0x0000007ff7aaeb7c <+252>: nop
0x0000007ff7aaeb80 <+256>: ldp x12, x13, [x1]
0x0000007ff7aaeb84 <+260>: and x14, x0, #0xf
0x0000007ff7aaeb88 <+264>: and x3, x0, #0xfffffffffffffff0
0x0000007ff7aaeb8c <+268>: sub x1, x1, x14
0x0000007ff7aaeb90 <+272>: add x2, x2, x14
0x0000007ff7aaeb94 <+276>: ldp x6, x7, [x1, #16]
=> 0x0000007ff7aaeb98 <+280>: stp x12, x13, [x0]
0x0000007ff7aaeb9c <+284>: ldp x8, x9, [x1, #32]
0x0000007ff7aaeba0 <+288>: ldp x10, x11, [x1, #48]
0x0000007ff7aaeba4 <+292>: ldp x12, x13, [x1, #64]!
0x0000007ff7aaeba8 <+296>: subs x2, x2, #0x90
0x0000007ff7aaebac <+300>: b.ls 0x7ff7aaebd8 <__memcpy_generic+344> // b.plast
0x0000007ff7aaebb0 <+304>: stp x6, x7, [x3, #16]
0x0000007ff7aaebb4 <+308>: ldp x6, x7, [x1, #16]
0x0000007ff7aaebb8 <+312>: stp x8, x9, [x3, #32]
0x0000007ff7aaebbc <+316>: ldp x8, x9, [x1, #32]
0x0000007ff7aaebc0 <+320>: stp x10, x11, [x3, #48]
0x0000007ff7aaebc4 <+324>: ldp x10, x11, [x1, #48]
0x0000007ff7aaebc8 <+328>: stp x12, x13, [x3, #64]!
0x0000007ff7aaebcc <+332>: ldp x12, x13, [x1, #64]!
0x0000007ff7aaebd0 <+336>: subs x2, x2, #0x40
0x0000007ff7aaebd4 <+340>: b.hi 0x7ff7aaebb0 <__memcpy_generic+304> // b.pmore
0x0000007ff7aaebd8 <+344>: ldp x14, x15, [x4, #-64]
0x0000007ff7aaebdc <+348>: stp x6, x7, [x3, #16]
0x0000007ff7aaebe0 <+352>: ldp x6, x7, [x4, #-48]
0x0000007ff7aaebe4 <+356>: stp x8, x9, [x3, #32]
0x0000007ff7aaebe8 <+360>: ldp x8, x9, [x4, #-32]
0x0000007ff7aaebec <+364>: stp x10, x11, [x3, #48]
0x0000007ff7aaebf0 <+368>: ldp x10, x11, [x4, #-16]
0x0000007ff7aaebf4 <+372>: stp x12, x13, [x3, #64]
0x0000007ff7aaebf8 <+376>: stp x14, x15, [x5, #-64]
0x0000007ff7aaebfc <+380>: stp x6, x7, [x5, #-48]
0x0000007ff7aaec00 <+384>: stp x8, x9, [x5, #-32]
0x0000007ff7aaec04 <+388>: stp x10, x11, [x5, #-16]
0x0000007ff7aaec08 <+392>: ret
End of assembler dump.
(gdb)
Looking at your posts:
0x0000007ff7aaeb90 <+272>: add x2, x2, x14 0x0000007ff7aaeb94 <+276>: ldp x6, x7, [x1, #16] => 0x0000007ff7aaeb98 <+280>: stp x12, x13, [x0] 0x0000007ff7aaeb9c <+284>: ldp x8, x9, [x1, #32]
Thread 1 "glmark2-drm" received signal SIGBUS, Bus error. __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173 173 ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory. (gdb) i r x0 x0 0x7fecda8cac 549434592428 (gdb) i r x1 x1 0x5556268180 366517584256 (gdb)
I now think that I was wrong above, the problem instruction is the store. Here x0 is not correctly aligned,but x1 is correctly aligned. So the problem is around the x0. 0x...cac is 12 bytes into a 16 byte alignment. or 4 bytes into a 8 byte alignment. In any case trying to write 8 bytes there is going to need to wrap into the next 8 bytes, and thus bus error.
This points to a bug in the memcpy assembler instructions in memcpy.S, but I think that is very unlikely, because this code runs fine on all sorts of arm64 machines.
Maybe replacing the instruction: stp x12, x13, [x0] with 16 stb instructions is all that is needed for a quick fix.
I say very unlikely, but someone else has found other bugs with the aarch64 memcpy recently here: https://gist.github.com/jnettlet/f6f8b49bb7c731255c46f541f875f436
So, maybe we have found a new bug here also.
Just trying it with a little bit of VRAM mmapped, the Bus error occurs when either reading or writing using memcpy with addresses that aren't 8 byte aligned. While it could be different on the SOQuartz, it's pretty unlikely.
Although one might think this is a bug, the comment at the beginning of the memcpy.S code says: / Assumptions:
*/
So, we need a version of memcpy that forces aligned accesses, at least on write, like the x86 code does. We could do a quick fix here, and replace the "stp x12, x13, [x0]" with 16 strb instructions together with some register shifting.
I made a version of memcpy that just copies a single byte at a time, and that can copy data around to and from VRAM without any bus errors. Trying to put that into a library right now that can just be loaded with LD_PRELOAD.
Preloading the library with LD_PRELOAD works at least with my small test programm.
https://gist.github.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359
I compiled it using gcc -shared -fPIC -o memcpy.so memcpy_unaligned.c
replacing memcpy doesn't seem to help with glmark. It still crashes on the buffer test with Bus error after rendering one frame.
@Coreforge When implementing memcpy, one generally has to do an associated memmove as well. I one byte at a time, over PCIe is going to be slow. It will be about 400% faster if you copy 32bits at a time. So, do a few 1 byte copies at the start, and then use 32bits(aligned) for the rest, with a small 1 byte tail at the end if needed.
Do objdump on the compiled memcpy, the compiler probably created 64 bit accesses when optimizing.
Since I cast the src and dst pointers to volatile chars, it should only to 8 bit accesses. While I don't know assembly very well, to me it looks like it's also only doing 8 bit at a time.
memcpy.so: file format elf64-littleaarch64
Disassembly of section .init:
0000000000000500 <_init>:
500: a9bf7bfd stp x29, x30, [sp, #-16]!
504: 910003fd mov x29, sp
508: 94000022 bl 590 <call_weak_fn>
50c: a8c17bfd ldp x29, x30, [sp], #16
510: d65f03c0 ret
Disassembly of section .plt:
0000000000000520 <.plt>:
520: a9bf7bf0 stp x16, x30, [sp, #-16]!
524: 90000090 adrp x16, 10000 <__FRAME_END__+0xf7e0>
528: f947fe11 ldr x17, [x16, #4088]
52c: 913fe210 add x16, x16, #0xff8
530: d61f0220 br x17
534: d503201f nop
538: d503201f nop
53c: d503201f nop
0000000000000540 <__cxa_finalize@plt>:
540: b0000090 adrp x16, 11000 <__cxa_finalize@GLIBC_2.17>
544: f9400211 ldr x17, [x16]
548: 91000210 add x16, x16, #0x0
54c: d61f0220 br x17
0000000000000550 <malloc@plt>:
550: b0000090 adrp x16, 11000 <__cxa_finalize@GLIBC_2.17>
554: f9400611 ldr x17, [x16, #8]
558: 91002210 add x16, x16, #0x8
55c: d61f0220 br x17
0000000000000560 <memcpy@plt>:
560: b0000090 adrp x16, 11000 <__cxa_finalize@GLIBC_2.17>
564: f9400a11 ldr x17, [x16, #16]
568: 91004210 add x16, x16, #0x10
56c: d61f0220 br x17
0000000000000570 <__gmon_start__@plt>:
570: b0000090 adrp x16, 11000 <__cxa_finalize@GLIBC_2.17>
574: f9400e11 ldr x17, [x16, #24]
578: 91006210 add x16, x16, #0x18
57c: d61f0220 br x17
0000000000000580 <free@plt>:
580: b0000090 adrp x16, 11000 <__cxa_finalize@GLIBC_2.17>
584: f9401211 ldr x17, [x16, #32]
588: 91008210 add x16, x16, #0x20
58c: d61f0220 br x17
Disassembly of section .text:
0000000000000590 <call_weak_fn>:
590: 90000080 adrp x0, 10000 <__FRAME_END__+0xf7e0>
594: f947ec00 ldr x0, [x0, #4056]
598: b4000040 cbz x0, 5a0 <call_weak_fn+0x10>
59c: 17fffff5 b 570 <__gmon_start__@plt>
5a0: d65f03c0 ret
5a4: d503201f nop
00000000000005a8 <deregister_tm_clones>:
5a8: b0000080 adrp x0, 11000 <__cxa_finalize@GLIBC_2.17>
5ac: 9100c000 add x0, x0, #0x30
5b0: b0000081 adrp x1, 11000 <__cxa_finalize@GLIBC_2.17>
5b4: 9100c021 add x1, x1, #0x30
5b8: eb00003f cmp x1, x0
5bc: 540000a0 b.eq 5d0 <deregister_tm_clones+0x28> // b.none
5c0: 90000081 adrp x1, 10000 <__FRAME_END__+0xf7e0>
5c4: f947e421 ldr x1, [x1, #4040]
5c8: b4000041 cbz x1, 5d0 <deregister_tm_clones+0x28>
5cc: d61f0020 br x1
5d0: d65f03c0 ret
5d4: d503201f nop
00000000000005d8 <register_tm_clones>:
5d8: b0000080 adrp x0, 11000 <__cxa_finalize@GLIBC_2.17>
5dc: 9100c000 add x0, x0, #0x30
5e0: b0000081 adrp x1, 11000 <__cxa_finalize@GLIBC_2.17>
5e4: 9100c021 add x1, x1, #0x30
5e8: cb000021 sub x1, x1, x0
5ec: 9343fc21 asr x1, x1, #3
5f0: 8b41fc21 add x1, x1, x1, lsr #63
5f4: 9341fc21 asr x1, x1, #1
5f8: b40000a1 cbz x1, 60c <register_tm_clones+0x34>
5fc: 90000082 adrp x2, 10000 <__FRAME_END__+0xf7e0>
600: f947f042 ldr x2, [x2, #4064]
604: b4000042 cbz x2, 60c <register_tm_clones+0x34>
608: d61f0040 br x2
60c: d65f03c0 ret
0000000000000610 <__do_global_dtors_aux>:
610: a9be7bfd stp x29, x30, [sp, #-32]!
614: 910003fd mov x29, sp
618: f9000bf3 str x19, [sp, #16]
61c: b0000093 adrp x19, 11000 <__cxa_finalize@GLIBC_2.17>
620: 3940c260 ldrb w0, [x19, #48]
624: 35000140 cbnz w0, 64c <__do_global_dtors_aux+0x3c>
628: 90000080 adrp x0, 10000 <__FRAME_END__+0xf7e0>
62c: f947e800 ldr x0, [x0, #4048]
630: b4000080 cbz x0, 640 <__do_global_dtors_aux+0x30>
634: b0000080 adrp x0, 11000 <__cxa_finalize@GLIBC_2.17>
638: f9401400 ldr x0, [x0, #40]
63c: 97ffffc1 bl 540 <__cxa_finalize@plt>
640: 97ffffda bl 5a8 <deregister_tm_clones>
644: 52800020 mov w0, #0x1 // #1
648: 3900c260 strb w0, [x19, #48]
64c: f9400bf3 ldr x19, [sp, #16]
650: a8c27bfd ldp x29, x30, [sp], #32
654: d65f03c0 ret
0000000000000658 <frame_dummy>:
658: 17ffffe0 b 5d8 <register_tm_clones>
000000000000065c <memcpy>:
65c: d10103ff sub sp, sp, #0x40
660: f9000fe0 str x0, [sp, #24]
664: f9000be1 str x1, [sp, #16]
668: f90007e2 str x2, [sp, #8]
66c: f9400be0 ldr x0, [sp, #16]
670: f9001be0 str x0, [sp, #48]
674: f9400fe0 ldr x0, [sp, #24]
678: f90017e0 str x0, [sp, #40]
67c: f9001fff str xzr, [sp, #56]
680: 1400000d b 6b4 <memcpy+0x58>
684: f9401be1 ldr x1, [sp, #48]
688: f9401fe0 ldr x0, [sp, #56]
68c: 8b000021 add x1, x1, x0
690: f94017e2 ldr x2, [sp, #40]
694: f9401fe0 ldr x0, [sp, #56]
698: 8b000040 add x0, x2, x0
69c: 39400021 ldrb w1, [x1]
6a0: 12001c21 and w1, w1, #0xff
6a4: 39000001 strb w1, [x0]
6a8: f9401fe0 ldr x0, [sp, #56]
6ac: 91000400 add x0, x0, #0x1
6b0: f9001fe0 str x0, [sp, #56]
6b4: f9401fe1 ldr x1, [sp, #56]
6b8: f94007e0 ldr x0, [sp, #8]
6bc: eb00003f cmp x1, x0
6c0: 54fffe23 b.cc 684 <memcpy+0x28> // b.lo, b.ul, b.last
6c4: f9400fe0 ldr x0, [sp, #24]
6c8: 910103ff add sp, sp, #0x40
6cc: d65f03c0 ret
00000000000006d0 <memmove>:
6d0: a9bc7bfd stp x29, x30, [sp, #-64]!
6d4: 910003fd mov x29, sp
6d8: f90017e0 str x0, [sp, #40]
6dc: f90013e1 str x1, [sp, #32]
6e0: f9000fe2 str x2, [sp, #24]
6e4: f9400fe0 ldr x0, [sp, #24]
6e8: 97ffff9a bl 550 <malloc@plt>
6ec: f9001fe0 str x0, [sp, #56]
6f0: f9400fe2 ldr x2, [sp, #24]
6f4: f94013e1 ldr x1, [sp, #32]
6f8: f9401fe0 ldr x0, [sp, #56]
6fc: 97ffff99 bl 560 <memcpy@plt>
700: f9400fe2 ldr x2, [sp, #24]
704: f9401fe1 ldr x1, [sp, #56]
708: f94017e0 ldr x0, [sp, #40]
70c: 97ffff95 bl 560 <memcpy@plt>
710: f9401fe0 ldr x0, [sp, #56]
714: 97ffff9b bl 580 <free@plt>
718: d503201f nop
71c: a8c47bfd ldp x29, x30, [sp], #64
720: d65f03c0 ret
Disassembly of section .fini:
0000000000000724 <_fini>:
724: a9bf7bfd stp x29, x30, [sp, #-16]!
728: 910003fd mov x29, sp
72c: a8c17bfd ldp x29, x30, [sp], #16
730: d65f03c0 ret
@Coreforge Correct. The code looks ok for memcpy. The memmove looks a bit odd though. It is supposed to work differently depending on which of src and dest are bigger.
The memmove function just allocates a temporary array, copies the data into that array and then copies it to the destination. This isn't ideal and would fail if there isn't enough memory available, but it works for now. I initially forgot to return the dst pointer from memmove though, which caused a bad_alloc during the cell shading test somewhere in std::vector stuff. With that fixed though, I was able to do a full run of glmark2 without any issues. Xorg still doesn't want to work though.
As with the Radxa CM3, I would also like to test Pine64's SOQuartz with some CM4 boards, since it's supposed to be pin-compatible.
@timonsku mentioned the Wiki (linked above) and this dtb artifact are the two best ways to get started with it. I'd like to write up my experience trying to get the thing to boot, and also seeing if it fits and works in a few popular CM4 boards (starting with the official IO Board).