geerlingguy / raspberry-pi-pcie-devices

Raspberry Pi PCI Express device compatibility database
http://pipci.jeffgeerling.com
GNU General Public License v3.0
1.52k stars 135 forks source link

Test Pine64 SOQuartz on CM4 boards #336

Open geerlingguy opened 2 years ago

geerlingguy commented 2 years ago

As with the Radxa CM3, I would also like to test Pine64's SOQuartz with some CM4 boards, since it's supposed to be pin-compatible.

DSC04914

DSC04917

@timonsku mentioned the Wiki (linked above) and this dtb artifact are the two best ways to get started with it. I'd like to write up my experience trying to get the thing to boot, and also seeing if it fits and works in a few popular CM4 boards (starting with the official IO Board).

pgwipeout commented 2 years ago

Userspace is mad, crashes at the same spot each time: [ 565.882] (EE) [ 565.882] (EE) Backtrace: [ 565.884] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x188) [0x559488b4c8] [ 565.884] (EE) unw_get_proc_info failed: no unwind info found [-10] [ 565.884] (EE) [ 565.885] (EE) Bus error at address 0x7f95206304 [ 565.885] (EE) Fatal server error: [ 565.885] (EE) Caught signal 7 (Bus error). Server aborting [ 565.885] (EE) [ 565.885] (EE)

Coreforge commented 2 years ago

That's the same error I'm getting. Does kmscube work?

pgwipeout commented 2 years ago

Yup, I have a glorious floating cube at 60FPS (I don't see a way to disable vsync)

pgwipeout commented 2 years ago
root@quartz64:~# kmscube
Using display 0x55848826b0 with EGL version 1.5
===================================
EGL information:
  version: "1.5"
  vendor: "Mesa Project"
  client extensions: "EGL_EXT_platform_base EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_KHR_client_get_all_proc_addresses EGL_EXT_client_extensions EGL_KHR_debug EGL_KHR_platform_x11 EGL_EXT_platform_x11 EGL_EXT_platform_device EGL_KHR_platform_wayland EGL_EXT_platform_wayland EGL_MESA_platform_xcb EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless"
  display extensions: "EGL_ANDROID_blob_cache EGL_EXT_buffer_age EGL_EXT_create_context_robustness EGL_EXT_image_dma_buf_import EGL_EXT_image_dma_buf_import_modifiers EGL_KHR_cl_event2 EGL_KHR_config_attribs EGL_KHR_create_context EGL_KHR_create_context_no_error EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses EGL_KHR_gl_colorspace EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_no_config_context EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_EXT_pixel_format_float EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image EGL_MESA_image_dma_buf_export EGL_MESA_query_driver EGL_WL_bind_wayland_display "
===================================
OpenGL ES 2.x information:
  version: "OpenGL ES 3.1 Mesa 21.2.6"
  shading language version: "OpenGL ES GLSL ES 3.10"
  vendor: "X.Org"
  renderer: "AMD TURKS (DRM 2.50.0 / 5.17.0-rc5-00097-gccb1df4cf6b5-dirty, LLVM 12.0.1)"
  extensions: "GL_EXT_blend_minmax GL_EXT_multi_draw_arrays GL_EXT_texture_filter_anisotropic GL_EXT_texture_compression_s3tc GL_EXT_texture_compression_dxt1 GL_EXT_texture_compression_rgtc GL_EXT_texture_format_BGRA8888 GL_OES_compressed_ETC1_RGB8_texture GL_OES_depth24 GL_OES_element_index_uint GL_OES_fbo_render_mipmap GL_OES_mapbuffer GL_OES_rgb8_rgba8 GL_OES_standard_derivatives GL_OES_stencil8 GL_OES_texture_3D GL_OES_texture_float GL_OES_texture_float_linear GL_OES_texture_half_float GL_OES_texture_half_float_linear GL_OES_texture_npot GL_OES_vertex_half_float GL_EXT_draw_instanced GL_EXT_texture_sRGB_decode GL_OES_EGL_image GL_OES_depth_texture GL_AMD_performance_monitor GL_OES_packed_depth_stencil GL_EXT_texture_type_2_10_10_10_REV GL_NV_conditional_render GL_OES_get_program_binary GL_APPLE_texture_max_level GL_EXT_discard_framebuffer GL_EXT_read_format_bgra GL_EXT_frag_depth GL_NV_fbo_color_attachments GL_OES_EGL_image_external GL_OES_EGL_sync GL_OES_vertex_array_object GL_OES_viewport_array GL_ANGLE_pack_reverse_row_order GL_ANGLE_texture_compression_dxt3 GL_ANGLE_texture_compression_dxt5 GL_EXT_occlusion_query_boolean GL_EXT_robustness GL_EXT_texture_rg GL_EXT_unpack_subimage GL_NV_draw_buffers GL_NV_read_buffer GL_NV_read_depth GL_NV_read_depth_stencil GL_NV_read_stencil GL_EXT_draw_buffers GL_EXT_map_buffer_range GL_KHR_debug GL_KHR_robustness GL_KHR_texture_compression_astc_ldr GL_NV_pixel_buffer_object GL_OES_depth_texture_cube_map GL_OES_required_internalformat GL_OES_surfaceless_context GL_EXT_color_buffer_float GL_EXT_sRGB_write_control GL_EXT_separate_shader_objects GL_EXT_shader_implicit_conversions GL_EXT_shader_integer_mix GL_EXT_tessellation_point_size GL_EXT_tessellation_shader GL_EXT_base_instance GL_EXT_compressed_ETC1_RGB8_sub_texture GL_EXT_copy_image GL_EXT_draw_buffers_indexed GL_EXT_draw_elements_base_vertex GL_EXT_gpu_shader5 GL_EXT_polygon_offset_clamp GL_EXT_primitive_bounding_box GL_EXT_render_snorm GL_EXT_shader_io_blocks GL_EXT_texture_border_clamp GL_EXT_texture_buffer GL_EXT_texture_cube_map_array GL_EXT_texture_norm16 GL_EXT_texture_view GL_KHR_context_flush_control GL_KHR_robust_buffer_access_behavior GL_NV_image_formats GL_OES_copy_image GL_OES_draw_buffers_indexed GL_OES_draw_elements_base_vertex GL_OES_gpu_shader5 GL_OES_primitive_bounding_box GL_OES_sample_shading GL_OES_sample_variables GL_OES_shader_io_blocks GL_OES_shader_multisample_interpolation GL_OES_tessellation_point_size GL_OES_tessellation_shader GL_OES_texture_border_clamp GL_OES_texture_buffer GL_OES_texture_cube_map_array GL_OES_texture_stencil8 GL_OES_texture_storage_multisample_2d_array GL_OES_texture_view GL_EXT_blend_func_extended GL_EXT_buffer_storage GL_EXT_float_blend GL_EXT_geometry_point_size GL_EXT_geometry_shader GL_EXT_texture_sRGB_R8 GL_KHR_no_error GL_KHR_texture_compression_astc_sliced_3d GL_OES_EGL_image_external_essl3 GL_OES_geometry_point_size GL_OES_geometry_shader GL_OES_shader_image_atomic GL_EXT_clip_cull_distance GL_EXT_disjoint_timer_query GL_EXT_texture_compression_s3tc_srgb GL_MESA_shader_integer_functions GL_EXT_clip_control GL_EXT_color_buffer_half_float GL_EXT_texture_compression_bptc GL_KHR_parallel_shader_compile GL_EXT_EGL_image_storage GL_MESA_framebuffer_flip_y GL_EXT_depth_clamp GL_EXT_texture_query_lod GL_MESA_bgra "
===================================
Using modifier ffffffffffffff
Modifiers failed!
Using modifier ffffffffffffff
Modifiers failed!
Rendered 120 frames in 2.000056 sec (59.998315 fps)
Rendered 240 frames in 4.000061 sec (59.999082 fps)
Coreforge commented 2 years ago

That's more than I have on the cm4 then. I'll take a look at the Xorg stuff once I get kmscube to work reliably with more than 1 core enabled.

jcdutton commented 2 years ago

Userspace is mad, crashes at the same spot each time: [ 565.882] (EE) [ 565.882] (EE) Backtrace: [ 565.884] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x188) [0x559488b4c8] [ 565.884] (EE) unw_get_proc_info failed: no unwind info found [-10] [ 565.884] (EE) [ 565.885] (EE) Bus error at address 0x7f95206304 [ 565.885] (EE) Fatal server error: [ 565.885] (EE) Caught signal 7 (Bus error). Server aborting [ 565.885] (EE) [ 565.885] (EE)

"Bus error" is normally an unaligned write or read. But if kmscube is working, the PCIe bus and the GPU driver is probably OK. Worth trying the command line version of glmark2 also. Were there any dmesg messages when the bus error happens?

Coreforge commented 2 years ago

On the cm4, there isn't anything useful in dmesg that I noticed. I think Xorg does some things over /dev/mem though (?), so there is probably something in there. I might try again later if I can narrow it down a bit.

jcdutton commented 2 years ago

glmark2-drm is the non-X command line version. Should work just like kmscube but does more tests.

pgwipeout commented 2 years ago

Just tested with a 4G board, which shouldn't faceplant with a DMA failure. Same error at the same spot. So the question is what is OsLookupColor+0x188 and what is it doing. No dmesg output at all, but with drmdebug there's a soft reset recovery on the card at the time of the failure.

pgwipeout commented 2 years ago

Curious:

master@soquartz:~$ sudo glmark2-drm
=======================================================
    glmark2 2021.02
=======================================================
    OpenGL Information
    GL_VENDOR:     X.Org
    GL_RENDERER:   AMD TURKS (DRM 2.50.0 / 5.17.0-rc5-00097-gccb1df4cf6b5-dirty, LLVM 12.0.1)
    GL_VERSION:    3.1 Mesa 21.2.2
=======================================================
[build] use-vbo=false: FPS: 60 FrameTime: 16.667 ms
[build] use-vbo=true: FPS: 60 FrameTime: 16.667 ms
[texture] texture-filter=nearest: FPS: 60 FrameTime: 16.667 ms
[texture] texture-filter=linear: FPS: 60 FrameTime: 16.667 ms
[texture] texture-filter=mipmap: FPS: 60 FrameTime: 16.667 ms
[shading] shading=gouraud: FPS: 60 FrameTime: 16.667 ms
[shading] shading=blinn-phong-inf: FPS: 60 FrameTime: 16.667 ms
[shading] shading=phong: FPS: 60 FrameTime: 16.667 ms
[shading] shading=cel: FPS: 60 FrameTime: 16.667 ms
[bump] bump-render=high-poly: FPS: 60 FrameTime: 16.667 ms
[bump] bump-render=normals: FPS: 60 FrameTime: 16.667 ms
[bump] bump-render=height: FPS: 60 FrameTime: 16.667 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 60 FrameTime: 16.667 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 49 FrameTime: 20.408 ms
[pulsar] light=false:quads=5:texture=false: FPS: 60 FrameTime: 16.667 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 30 FrameTime: 33.333 ms
[desktop] effect=shadow:windows=4: FPS: 59 FrameTime: 16.949 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 59 FrameTime: 16.949 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata:Bus error
jcdutton commented 2 years ago

@pgwipeout So it "bus error" on the "buffer" one. sudo glmark2-drm -b buffer That will just run the "buffer" one. Just to see if it repeatably fails on that one.

So, there are probably some unaligned write/read left to fix in the driver. But this is very good progress. Great work.

pgwipeout commented 2 years ago
root@soquartz:~# glmark2-drm -b buffer
radeon: The kernel rejected CS, see dmesg for more information (-16).
=======================================================
    glmark2 2021.02
=======================================================
    OpenGL Information
    GL_VENDOR:     X.Org
    GL_RENDERER:   AMD TURKS (DRM 2.50.0 / 5.17.0-rc5-00097-gccb1df4cf6b5-dirty, LLVM 12.0.1)
    GL_VERSION:    3.1 Mesa 21.2.2
=======================================================
radeon: The kernel rejected CS, see dmesg for more information (-16).
[buffer] <default>:radeon: The kernel rejected CS, see dmesg for more information (-16).
radeon: The kernel rejected CS, see dmesg for more information (-16).
<snip>
radeon: The kernel rejected CS, see dmesg for more information (-16).
 FPS: 60 FrameTime: 16.667 ms
=======================================================
                                  glmark2 Score: 60
=======================================================

Nothing in the dmesg, let me turn on drmdebug.

pgwipeout commented 2 years ago

I apologize in advance for this, here's the last output before it dumps (it dumps at the same place every time).

[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata:[  288.489559] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_MMAP
[  288.491841] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[  288.492741] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[  288.493514] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_WAIT_IDLE
[  288.494320] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_MMAP
[  288.506527] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[  288.507336] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[  288.508096] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_WAIT_IDLE
[  288.508984] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_MMAP
[  288.518727] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[  288.519765] [drm:drm_ioctl] comm="glmark2-d:rcs0" pid=974, dev=0xe201, auth=1, RADEON_CS
[  288.521283] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_WAIT_IDLE
[  288.522136] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, DRM_IOCTL_MODE_PAGE_FLIP
[  288.522970] [drm:evergreen_irq_set [radeon]] dpm thermal
[  288.523655] [drm:radeon_irq_kms_set_irq_n_enabled [radeon]] vblank0 interrupts enabled
[  288.524596] radeon 0000:01:00.0: [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0
[  288.524567] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59648, wptr 59664
[  288.525512] [drm:radeon_crtc_page_flip_target [radeon]] flip-ioctl() cur_rbo = 0000000011fbf557, new_rbo = 0000000033003173
[  288.526227] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[  288.527197] [drm:evergreen_irq_set [radeon]] evergreen_irq_set: sw int gfx
[  288.527991] [drm:drm_mode_object_get] OBJ ID: 61 (2)
[  288.528237] [drm:evergreen_irq_set [radeon]] dpm thermal
[  288.528774] [drm:drm_mode_object_put.part.0] OBJ ID: 61 (3)
[  288.529779] [drm:drm_mode_object_put.part.0] OBJ ID: 60 (2)
[  288.530447] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59664, wptr 59680
[  288.531495] [drm:evergreen_irq_process [radeon]] IH: CP EOP
[  288.532209] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59680, wptr 59696
[  288.532269] [drm:evergreen_irq_set [radeon]] dpm thermal
[  288.533263] [drm:evergreen_irq_process [radeon]] IH: D1 vblank - IH event w/o asserted irq bit?
[  288.533723] [drm:evergreen_irq_set [radeon]] dpm thermal
[  288.534439] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[  288.535507] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59696, wptr 59696
[  288.547908] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59696, wptr 59712
[  288.549003] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[  288.549726] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59712, wptr 59728
[  288.550749] [drm:evergreen_irq_process [radeon]] IH: D1 flip
[  288.551472] [drm:evergreen_irq_set [radeon]] dpm thermal
[  288.552262] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59728, wptr 59728
[  288.559484] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_CREATE
[  288.561072] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_MMAP
[  288.563469] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_BUSY
[  288.564608] [drm:drm_ioctl] comm="glmark2-d:rcs0" pid=974, dev=0xe201, auth=1, RADEON_CS
[  288.564584] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59728, wptr 59744
[  288.566185] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, RADEON_GEM_WAIT_IDLE
[  288.566243] radeon 0000:01:00.0: [drm:vblank_disable_fn] disabling vblank on crtc 0
[  288.567036] [drm:drm_ioctl] comm="glmark2-drm" pid=973, dev=0xe201, auth=1, DRM_IOCTL_MODE_PAGE_FLIP
[  288.567689] [drm:evergreen_irq_set [radeon]] dpm thermal
[  288.568931] [drm:radeon_irq_kms_set_irq_n_enabled [radeon]] vblank0 interrupts disabled
[  288.569887] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[  288.569895] [drm:evergreen_irq_set [radeon]] dpm thermal
[  288.571089] [drm:radeon_irq_kms_set_irq_n_enabled [radeon]] vblank0 interrupts enabled
[  288.572039] radeon 0000:01:00.0: [drm:drm_vblank_enable] enabling vblank on crtc 0, ret: 0
[  288.572914] [drm:radeon_crtc_page_flip_target [radeon]] flip-ioctl() cur_rbo = 0000000033003173, new_rbo = 0000000011fbf557
[  288.574193] [drm:evergreen_irq_set [radeon]] dpm thermal
[  288.575037] [drm:drm_mode_object_get] OBJ ID: 60 (2)
[  288.575514] [drm:drm_mode_object_put.part.0] OBJ ID: 60 (3)
[  288.576026] [drm:drm_mode_object_put.part.0] OBJ ID: 61 (2)
[  288.581241] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59744, wptr 59760
[  288.582335] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[  288.583053] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59760, wptr 59776
[  288.584076] [drm:evergreen_irq_process [radeon]] IH: D1 flip
[  288.584803] [drm:evergreen_irq_set [radeon]] dpm thermal
[  288.585571] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59776, wptr 59776
[  288.597907] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59776, wptr 59792
[  288.599025] radeon 0000:01:00.0: [drm:vblank_disable_fn] disabling vblank on crtc 0
[  288.599748] [drm:evergreen_irq_set [radeon]] dpm thermal
[  288.600428] [drm:radeon_irq_kms_set_irq_n_enabled [radeon]] vblank0 interrupts disabled
[  288.601369] [drm:evergreen_irq_process [radeon]] IH: D1 vblank
[  290.161566] [drm:drm_release] open_count = 1
[  290.162005] [drm:drm_file_free.part.0] comm="glmark2-drm", pid=973, dev=0xe201, open_count=1
[  290.162767] [drm:drm_mode_object_put.part.0] OBJ ID: 61 (1)
[  290.163413] radeon 0000:01:00.0: [drm:drm_mode_rmfb_work_fn] Removing [FB:60] from all active usage due to RMFB ioctl
[  290.164428] radeon 0000:01:00.0: [drm:drm_framebuffer_remove] Disabling [CRTC:42:crtc-0] because [FB:60] is removed
[  290.165524] [drm:drm_crtc_helper_set_config]
[  290.165934] [drm:drm_crtc_helper_set_config] [CRTC:42:crtc-0] [NOFB]
[  290.166510] [drm:drm_mode_object_put.part.0] OBJ ID: 56 (4)
[  290.167032] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[  290.168305] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 30 to mode 3, devices 00000080, active_devices 00000080
[  290.169623] [drm:evergreen_hdmi_enable [radeon]] Disabling HDMI interface @ 0x0000 for encoder 0x1e
[  290.171080] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[  290.198351] [drm:rv740_get_dll_speed [radeon]] Target MCLK greater than largest MCLK in DLL speed table
[  290.199450] [drm:rv740_get_dll_speed [radeon]] Target MCLK greater than largest MCLK in DLL speed table
[  290.202985] [drm:drm_mode_object_put.part.0] OBJ ID: 60 (2)
[  290.203650] [drm:drm_mode_object_put.part.0] OBJ ID: 60 (1)
[  290.205263] [drm:drm_release]
[  290.205590] [drm:drm_crtc_helper_set_config]
[  290.205988] [drm:drm_crtc_helper_set_config] [CRTC:42:crtc-0] [FB:58] #connectors=1 (x y) (0 0)
[  290.206764] [drm:drm_crtc_helper_set_config] crtc has no fb, full mode set
[  290.207373] [drm:drm_mode_object_get] OBJ ID: 56 (2)
[  290.207823] [drm:drm_crtc_helper_set_config] connector dpms not on, full mode switch
[  290.208590] [drm:drm_crtc_helper_set_config] encoder changed, full mode switch
[  290.209241] [drm:drm_crtc_helper_set_config] crtc changed, full mode switch
[  290.209864] [drm:drm_crtc_helper_set_config] [CONNECTOR:56:DVI-I-1] to [CRTC:42:crtc-0]
[  290.210682] [drm:drm_crtc_helper_set_config] attempting to set mode from userspace
[  290.211362] [drm:drm_mode_debug_printmodeline] Modeline "1920x1080": 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x48 0x5
[  290.212443] [drm:radeon_encoder_set_active_device [radeon]] setting active device to 00000080 from 00000080 00000081 for encoder 2
[  290.213680] [drm:radeon_get_monitor_bpc [radeon]] DVI-I-1: hdmi mode dotclock 148500 kHz, max tmds input clock 225000 kHz.
[  290.214793] [drm:radeon_get_monitor_bpc [radeon]] DVI-I-1: Display bpc=8, returned bpc=8
[  290.215656] [drm:drm_crtc_helper_set_mode] [CRTC:42:crtc-0]
[  290.216268] [drm:radeon_get_monitor_bpc [radeon]] DVI-I-1: hdmi mode dotclock 148500 kHz, max tmds input clock 225000 kHz.
[  290.217499] [drm:radeon_get_monitor_bpc [radeon]] DVI-I-1: Display bpc=8, returned bpc=8
[  290.218363] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[  290.219454] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[  290.221132] [drm:rv740_get_dll_speed [radeon]] Target MCLK greater than largest MCLK in DLL speed table
[  290.222145] [drm:rv740_get_dll_speed [radeon]] Target MCLK greater than largest MCLK in DLL speed table
[  290.224450] [drm:radeon_compute_pll_avivo [radeon]] 148500 - 148500, pll dividers - fb: 88.0 ref: 2, post 8
[  290.236560] [drm:drm_crtc_helper_set_mode] [ENCODER:55:TMDS-55] set [MODE:1920x1080]
[  290.236457] [drm:evergreen_irq_process [radeon]] evergreen_irq_process start: rptr 59792, wptr 59808
[  290.237317] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 30 to mode 3, devices 00000080, active_devices 00000080
[  290.238154] [drm:evergreen_irq_process [radeon]] IH: D1 flip
[  290.239062] [drm:evergreen_hdmi_enable [radeon]] Disabling HDMI interface @ 0x0000 for encoder 0x1e
[  290.239559] [drm:radeon_crtc_handle_flip [radeon]] radeon_crtc->flip_status = 0 != RADEON_FLIP_SUBMITTED(2)
[  290.240897] [drm:drm_detect_monitor_audio] Monitor has basic audio support
[  290.242034] [drm:dce4_hdmi_set_color_depth [radeon]] DVI-I-1: Disabling hdmi deep color for 8 bpc.
[  290.243252] [drm:dce5_crtc_load_lut [radeon]] 0
[  290.276581] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 30 to mode 0, devices 00000080, active_devices 00000080
[  290.277695] [drm:drm_detect_monitor_audio] Monitor has basic audio support
[  290.278314] [drm:evergreen_hdmi_enable [radeon]] Enabling HDMI interface @ 0x0000 for encoder 0x1e
[  290.280548] radeon 0000:01:00.0: [drm:drm_calc_timestamping_constants] crtc 42: hwmode: htotal 2200, vtotal 1125, vdisplay 1080
[  290.281711] radeon 0000:01:00.0: [drm:drm_calc_timestamping_constants] crtc 42: clock 148500 kHz framedur 16666666 linedur 14814
[  290.282746] [drm:drm_crtc_helper_set_config] Setting connector DPMS state to on
[  290.283399] [drm:drm_crtc_helper_set_config]         [CONNECTOR:56:DVI-I-1] set DPMS on
[  290.284196] [drm:dce5_crtc_load_lut [radeon]] 0
[  290.304330] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 30 to mode 0, devices 00000080, active_devices 00000080
[  290.305601] [drm:drm_detect_monitor_audio] Monitor has basic audio support
[  290.306238] [drm:evergreen_hdmi_enable [radeon]] Enabling HDMI interface @ 0x0000 for encoder 0x1e
[  290.307827] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[  290.309520] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[  290.311227] [drm:drm_mode_object_get] OBJ ID: 58 (1)
[  290.311705] [drm:drm_crtc_helper_set_config]
[  290.312223] [drm:drm_crtc_helper_set_config] [CRTC:44:crtc-1] [NOFB]
[  290.312822] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[  290.314201] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[  290.315899] [drm:drm_mode_object_get] OBJ ID: 58 (2)
[  290.316461] [drm:drm_mode_object_put.part.0] OBJ ID: 58 (3)
[  290.316988] [drm:drm_crtc_helper_set_config]
[  290.317529] [drm:drm_crtc_helper_set_config] [CRTC:46:crtc-2] [NOFB]
[  290.318131] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[  290.319313] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[  290.321138] [drm:drm_mode_object_get] OBJ ID: 58 (2)
[  290.321611] [drm:drm_mode_object_put.part.0] OBJ ID: 58 (3)
[  290.322117] [drm:drm_crtc_helper_set_config]
[  290.322511] [drm:drm_crtc_helper_set_config] [CRTC:48:crtc-3] [NOFB]
[  290.323086] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[  290.324281] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[  290.325890] [drm:drm_mode_object_get] OBJ ID: 58 (2)
[  290.326358] [drm:drm_mode_object_put.part.0] OBJ ID: 58 (3)
[  290.326863] [drm:drm_crtc_helper_set_config]
[  290.327257] [drm:drm_crtc_helper_set_config] [CRTC:50:crtc-4] [NOFB]
[  290.327831] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[  290.329011] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[  290.330829] [drm:drm_mode_object_get] OBJ ID: 58 (2)
[  290.331298] [drm:drm_mode_object_put.part.0] OBJ ID: 58 (3)
[  290.331806] [drm:drm_crtc_helper_set_config]
[  290.332275] [drm:drm_crtc_helper_set_config] [CRTC:52:crtc-5] [NOFB]
[  290.332862] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 33 to mode 3, devices 00000008, active_devices 00000000
[  290.333980] [drm:radeon_atom_encoder_dpms [radeon]] encoder dpms 21 to mode 3, devices 00000001, active_devices 00000000
[  290.335576] [drm:drm_mode_object_get] OBJ ID: 58 (2)
[  290.336039] [drm:drm_mode_object_put.part.0] OBJ ID: 58 (3)
[  290.336612] [drm:drm_release] driver lastclose completed
Bus error (core dumped)
jcdutton commented 2 years ago

@pgwipeout At least the problem is predictable. :-) You will probably need to compile glmark2 in debug mode, then run it under gdb to find out which line the bus error is on. Hopefully it will narrow it down to a syscall to the driver, and then we will know which function in the driver code has the alignment problem. Alternatively, it might be worth trying (as root) strace -f /usr/bin/glmark2-drm -b buffer

Coreforge commented 2 years ago

The buffer benchmark by default uses "map" as the vbo update method. I haven't looked very closely into glmark2s code, but my guess is that it maps the buffer on the GPU into userspace and accesses is that way, while not making sure alignment is correct. I don't know if there is a way to enforce alignment on mapped memory from the kernel though, or if all userspace software would have to be changed.

Coreforge commented 2 years ago

Try running glmark2 -b buffer:update-method=subdata instead

jcdutton commented 2 years ago

If it helps, one can force alignment checking on x86 and even get the compiler to check for it with -Wcast-align You add an extra line of inline asm code to switch it on around the section you wish to check. An example page explaining it: https://www.xszz.org/faq-41/question-20190628214421.html

Coreforge commented 2 years ago

I just tried the buffer scene on the cm4, and it runs successfully without changing any parameters. I can't run the full benchmark though, as the pi can lock up completely after a while running 3D stuff (though it usually gets through one scene)

jcdutton commented 2 years ago

@pgwipeout Please can you post the output of lspci -vv There might be a BAR alignment problem. For GPUs, the BAR has to be aligned on a BAR size boundary. So if its a 256MB BAR size, it has to be on a 256MB boundary.

Coreforge commented 2 years ago

On the CM4, it also crashes with Bus error on the buffer test when running all benchmarks, but runs buffer without issue when it's the only one being run.

pgwipeout commented 2 years ago

@jcdutton You crazy person, you've done it. Yeah our alignment was off because of a previous bug with SSDs. Now I've got the gpu working, but I've gotta test if there's regressions there.

jcdutton commented 2 years ago

@pgwipeout Just to clarify, was it the BAR alignment?

pgwipeout commented 2 years ago

Aye, it was the bar alignment. Still some other quirks going on here, but I got an X session to start.

pgwipeout commented 2 years ago

Celebration was premature. The Xsession is using the video output, but it's llvmpipe.

Still crash at [buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata:Bus error (core dumped) unfortunately though. Wayland is also still unhappy.

pgwipeout commented 2 years ago

It's smart enough to align the buffer even if the window isn't aligned. I switched back to the original state:

lspci -vv
00:00.0 PCI bridge: Rockchip Electronics Co., Ltd RK3568 Remote Signal Processor (rev 01) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 47
        Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
        I/O behind bridge: 00001000-00001fff [size=4K]
        Memory behind bridge: 02000000-020fffff [size=1M]
        Prefetchable memory behind bridge: 0000000010000000-000000001fffffff [size=256M]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
        Expansion ROM at 302100000 [virtual] [disabled] [size=64K]
        BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable+ Count=1/32 Maskable- 64bit+
                Address: 00000000fd450040  Data: 0000
        Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x1 (ok)
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
                RootCap: CRSVisible+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP+ LTR+
                         10BitTagComp- 10BitTagReq- OBFF Via message/WAKE#, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd-
                         AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, ARIFwd-
                         AtomicOpsCtl: ReqEn- EgressBlck-
                LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [b0] MSI-X: Enable- Count=1 Masked-
                Vector table: BAR=0 offset=00000000
                PBA: BAR=0 offset=00000010
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
                RootCmd: CERptEn- NFERptEn- FERptEn-
                RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
                         FirstFatal- NonFatalMsg- FatalMsg- IntMsg 9
                ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
        Capabilities: [148 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn- PerformEqu-
                LaneErrStat: 0
        Capabilities: [160 v1] L1 PM Substates
                L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                          PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
                L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
                           T_CommonMode=10us LTR1.2_Threshold=0ns
                L1SubCtl2: T_PwrOn=10us
        Capabilities: [170 v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
        Kernel driver in use: pcieport

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Turks PRO [Radeon HD 7570] (prog-if 00 [VGA controller])
        Subsystem: Dell Turks PRO [Radeon HD 7570]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 79
        Region 0: Memory at 310000000 (64-bit, prefetchable) [size=256M]
        Region 2: Memory at 302000000 (64-bit, non-prefetchable) [size=128K]
        Region 4: I/O ports at 1000 [size=256]
        Expansion ROM at 302020000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #4, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x1 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fd450040  Data: 0000
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Kernel driver in use: radeon
        Kernel modules: radeon

01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Turks HDMI Audio [Radeon HD 6500/6600 / 6700M Series]
        Subsystem: Dell Turks HDMI Audio [Radeon HD 6500/6600 / 6700M Series]
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin B routed to IRQ 53
        Region 0: Memory at 302040000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #4, Speed 5GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x1 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fd450040  Data: 0000
        Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Kernel driver in use: snd_hda_intel
jcdutton commented 2 years ago

FYI, glmark2 does 60fps on the Radeon card, even on x86. So, it only doing 60fps is not an ARM specific bug.

pgwipeout commented 2 years ago

Yeah, kms enforces vsync, so your fps will be exactly your refresh rate. strace has been helpful, it doesn't look like a misaligned write:

futex(0x5598fbf4e8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x5598fbf490, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x55990b9d40, FUTEX_WAIT_BITSET, 2, NULL, FUTEX_BITSET_MATCH_ANY) = 0
ioctl(5, DRM_IOCTL_RADEON_GEM_WAIT_IDLE, 0x7fe8e0fdd8) = 0
ioctl(3, DRM_IOCTL_MODE_PAGE_FLIP, 0x7fe8e0fe20) = 0
pselect6(4, [3], NULL, NULL, NULL, NULL) = 1 (in [3])
read(3, "\2\0\0\0 \0\0\0\214\376\340\350\177\0\0\0+\5\0\0\211\n\10\0\2204\1\0*\0\0\0", 1024) = 32
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRALN, si_addr=0x7f7ba7e2ac} ---
+++ killed by SIGBUS (core dumped) +++
Bus error (core dumped)
jcdutton commented 2 years ago

It is a misaligned read or write. It even states the address: si_addr=0x7f7ba7e2ac I think the next step to diagnose this is to build glmark2 from sources, in debug mode and -O0. Then run it under gdb. You could try running glmark2-drm in gdb (even without debug mode). It might give you some information when it SIGBUS. The 0x7f7ba7e2ac is a virtual address. Most likely allocated in a mmap call to the driver. If the driver is responding to the mmap call, and allocating something in physical RAM > 4GB, it might be the cause of your problem.

pgwipeout commented 2 years ago
Thread 1 "glmark2-drm" received signal SIGBUS, Bus error.
__memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
173     ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory.
(gdb) where
#0  __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
#1  0x0000007ff696b42c in ?? () from /usr/lib/aarch64-linux-gnu/dri/r600_dri.so
#2  0x00000055555e83fc in Mesh::update_single_vbo (this=0x5555d47400,
    ranges=std::vector of length 10, capacity 16 = {...}, n=0, nfloats=3) at ../src/mesh.cpp:472
#3  0x00000055555e8508 in Mesh::update_vbo (this=0x5555d47400, ranges=std::vector of length 10, capacity 16 = {...})
    at ../src/mesh.cpp:499
#4  0x000000555557fba4 in WaveMesh::update (this=0x5555d47400, elapsed=0.075626000000056592)
    at ../src/scene-buffer.cpp:163
#5  0x000000555557eef8 in SceneBuffer::update (this=0x5555689400) at ../src/scene-buffer.cpp:434
#6  0x000000555557b308 in MainLoop::draw (this=0x55557b5350) at ../src/main-loop.cpp:134
#7  0x000000555557b18c in MainLoop::step (this=0x55557b5350) at ../src/main-loop.cpp:108
#8  0x000000555555f7b8 in do_benchmark (canvas=...) at ../src/main.cpp:123
#9  0x000000555555fc5c in main (argc=1, argv=0x7ffffff668) at ../src/main.cpp:226
pgwipeout commented 2 years ago

Hmmmm::

[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata:
Thread 1 "glmark2-drm" received signal SIGBUS, Bus error.
__memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
173     ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory.
(gdb) where
#0  __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
#1  0x0000007ff6b845c0 in r600_buffer_subdata (ctx=0x555619d070, buffer=0x5555cd68b0, usage=10, offset=1452,
    size=15828, data=0x55558a5e7c) at ../src/gallium/drivers/r600/r600_buffer_common.c:570
#2  0x0000007ff60289c4 in _mesa_bufferobj_subdata (ctx=0x5555863e20, offset=1452, size=15828, data=0x55558a5e7c,
    obj=0x5555cd67b0) at ../src/mesa/main/bufferobj.c:115
#3  0x0000007ff602fa40 in _mesa_buffer_sub_data (ctx=0x5555863e20, bufObj=0x5555cd67b0, offset=1452, size=15828,
    data=0x55558a5e7c) at ../src/mesa/main/bufferobj.c:2633
#4  0x0000007ff602fd08 in buffer_sub_data (func=0x7ff70d5638 "glBufferSubData", no_error=false, dsa=false,
    data=0x55558a5e7c, size=15828, offset=1452, buffer=0, target=34962) at ../src/mesa/main/bufferobj.c:2665
#5  _mesa_BufferSubData (target=34962, offset=1452, size=15828, data=0x55558a5e7c) at ../src/mesa/main/bufferobj.c:2682
#6  0x00000055555e83fc in Mesh::update_single_vbo (this=0x55557d4570,
    ranges=std::vector of length 10, capacity 16 = {...}, n=0, nfloats=3) at ../src/mesh.cpp:472
#7  0x00000055555e8508 in Mesh::update_vbo (this=0x55557d4570, ranges=std::vector of length 10, capacity 16 = {...})
    at ../src/mesh.cpp:499
#8  0x000000555557fba4 in WaveMesh::update (this=0x55557d4570, elapsed=0.089772999999922831)
    at ../src/scene-buffer.cpp:163
#9  0x000000555557eef8 in SceneBuffer::update (this=0x5555689400) at ../src/scene-buffer.cpp:434
#10 0x000000555557b308 in MainLoop::draw (this=0x55557e19c0) at ../src/main-loop.cpp:134
#11 0x000000555557b18c in MainLoop::step (this=0x55557e19c0) at ../src/main-loop.cpp:108
#12 0x000000555555f7b8 in do_benchmark (canvas=...) at ../src/main.cpp:123
#13 0x000000555555fc5c in main (argc=1, argv=0x7ffffff688) at ../src/main.cpp:226
Coreforge commented 2 years ago

subdata also just maps the buffer and copies the data with memcpy, so I guess that won't work. What might work though is to use LD_PRELOAD to load a library that overwrites __memcpy_generic with a version that only copies a single byte at a time (for now, as that shouldn't have any alignment requirements)

pgwipeout commented 2 years ago

Well, this is entirely weird. I set a breakpoint at that memcpy and stepped through the program and it didn't crash.

pgwipeout commented 2 years ago

Triggered with a breakpoint.

Thread 1 "glmark2-drm" hit Breakpoint 5, r600_buffer_subdata (ctx=0x55558af180, buffer=0x5556254c30, usage=10, offset=0, size=7200, data=0x55560fcc70) at ../src/gallium/drivers/r600/r600_buffer_common.c:570
570             memcpy(map, data, size);
(gdb)

$247 = (uint8_t *) 0x7feca58a80 <error: Cannot access memory at address 0x7feca58a80>

Thread 1 "glmark2-drm" hit Breakpoint 5, r600_buffer_subdata (ctx=0x55558af180, buffer=0x5556254c30, usage=10, offset=10092, size=25908, data=0x55560ff3dc) at ../src/gallium/drivers/r600/r600_buffer_common.c:570
570             memcpy(map, data, size);
(gdb)
$248 = (uint8_t *) 0x7feca5a6ec <error: Cannot access memory at address 0x7feca5a6ec>

Thread 1 "glmark2-drm" received signal SIGBUS, Bus error.
pgwipeout commented 2 years ago

I've created an issue on mesa in regards to this bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6142

jcdutton commented 2 years ago

@pgwipeout We need to determine whether the unaligned fault is due to a read or a write. most memcpy implementations are 8 byte or 16 bytes aligned when writing to the dest. They do not tend to be aligned on read from the src. In this case: map is the dest data is the src. So, when running in gdb, and you hit the bus error. type "disassemble" and post the output here.

pgwipeout commented 2 years ago
Thread 1 "glmark2-drm" received signal SIGBUS, Bus error.
__memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
173     ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory.
(gdb) disassemble
Dump of assembler code for function __memcpy_generic:
   0x0000007ff7aaea80 <+0>:     nop
   0x0000007ff7aaea84 <+4>:     add     x4, x1, x2
   0x0000007ff7aaea88 <+8>:     add     x5, x0, x2
   0x0000007ff7aaea8c <+12>:    cmp     x2, #0x80
   0x0000007ff7aaea90 <+16>:    b.hi    0x7ff7aaeb80 <__memcpy_generic+256>  // b.pmore
   0x0000007ff7aaea94 <+20>:    cmp     x2, #0x20
   0x0000007ff7aaea98 <+24>:    b.hi    0x7ff7aaeb10 <__memcpy_generic+144>  // b.pmore
   0x0000007ff7aaea9c <+28>:    cmp     x2, #0x10
   0x0000007ff7aaeaa0 <+32>:    b.cc    0x7ff7aaeab8 <__memcpy_generic+56>  // b.lo, b.ul, b.last
   0x0000007ff7aaeaa4 <+36>:    ldp     x6, x7, [x1]
   0x0000007ff7aaeaa8 <+40>:    ldp     x12, x13, [x4, #-16]
   0x0000007ff7aaeaac <+44>:    stp     x6, x7, [x0]
   0x0000007ff7aaeab0 <+48>:    stp     x12, x13, [x5, #-16]
   0x0000007ff7aaeab4 <+52>:    ret
   0x0000007ff7aaeab8 <+56>:    tbz     w2, #3, 0x7ff7aaead0 <__memcpy_generic+80>
   0x0000007ff7aaeabc <+60>:    ldr     x6, [x1]
   0x0000007ff7aaeac0 <+64>:    ldur    x7, [x4, #-8]
   0x0000007ff7aaeac4 <+68>:    str     x6, [x0]
   0x0000007ff7aaeac8 <+72>:    stur    x7, [x5, #-8]
   0x0000007ff7aaeacc <+76>:    ret
   0x0000007ff7aaead0 <+80>:    tbz     w2, #2, 0x7ff7aaeae8 <__memcpy_generic+104>
   0x0000007ff7aaead4 <+84>:    ldr     w6, [x1]
   0x0000007ff7aaead8 <+88>:    ldur    w8, [x4, #-4]
   0x0000007ff7aaeadc <+92>:    str     w6, [x0]
   0x0000007ff7aaeae0 <+96>:    stur    w8, [x5, #-4]
   0x0000007ff7aaeae4 <+100>:   ret
   0x0000007ff7aaeae8 <+104>:   cbz     x2, 0x7ff7aaeb08 <__memcpy_generic+136>
   0x0000007ff7aaeaec <+108>:   lsr     x14, x2, #1
   0x0000007ff7aaeaf0 <+112>:   ldrb    w6, [x1]
   0x0000007ff7aaeaf4 <+116>:   ldurb   w10, [x4, #-1]
   0x0000007ff7aaeaf8 <+120>:   ldrb    w8, [x1, x14]
   0x0000007ff7aaeafc <+124>:   strb    w6, [x0]
   0x0000007ff7aaeb00 <+128>:   strb    w8, [x0, x14]
   0x0000007ff7aaeb04 <+132>:   sturb   w10, [x5, #-1]
   0x0000007ff7aaeb08 <+136>:   ret
   0x0000007ff7aaeb0c <+140>:   nop
   0x0000007ff7aaeb10 <+144>:   ldp     x6, x7, [x1]
   0x0000007ff7aaeb14 <+148>:   ldp     x8, x9, [x1, #16]
   0x0000007ff7aaeb18 <+152>:   ldp     x10, x11, [x4, #-32]
   0x0000007ff7aaeb1c <+156>:   ldp     x12, x13, [x4, #-16]
   0x0000007ff7aaeb20 <+160>:   cmp     x2, #0x40
   0x0000007ff7aaeb24 <+164>:   b.hi    0x7ff7aaeb40 <__memcpy_generic+192>  // b.pmore
   0x0000007ff7aaeb28 <+168>:   stp     x6, x7, [x0]
   0x0000007ff7aaeb2c <+172>:   stp     x8, x9, [x0, #16]
   0x0000007ff7aaeb30 <+176>:   stp     x10, x11, [x5, #-32]
   0x0000007ff7aaeb34 <+180>:   stp     x12, x13, [x5, #-16]
   0x0000007ff7aaeb38 <+184>:   ret
   0x0000007ff7aaeb3c <+188>:   nop
   0x0000007ff7aaeb40 <+192>:   ldp     x14, x15, [x1, #32]
   0x0000007ff7aaeb44 <+196>:   ldp     x16, x17, [x1, #48]
   0x0000007ff7aaeb48 <+200>:   cmp     x2, #0x60
   0x0000007ff7aaeb4c <+204>:   b.ls    0x7ff7aaeb60 <__memcpy_generic+224>  // b.plast
   0x0000007ff7aaeb50 <+208>:   ldp     x2, x3, [x4, #-64]
   0x0000007ff7aaeb54 <+212>:   ldp     x1, x4, [x4, #-48]
   0x0000007ff7aaeb58 <+216>:   stp     x2, x3, [x5, #-64]
   0x0000007ff7aaeb5c <+220>:   stp     x1, x4, [x5, #-48]
   0x0000007ff7aaeb60 <+224>:   stp     x6, x7, [x0]
   0x0000007ff7aaeb64 <+228>:   stp     x8, x9, [x0, #16]
   0x0000007ff7aaeb68 <+232>:   stp     x14, x15, [x0, #32]
   0x0000007ff7aaeb6c <+236>:   stp     x16, x17, [x0, #48]
   0x0000007ff7aaeb70 <+240>:   stp     x10, x11, [x5, #-32]
--Type <RET> for more, q to quit, c to continue without paging--c
   0x0000007ff7aaeb74 <+244>:   stp     x12, x13, [x5, #-16]
   0x0000007ff7aaeb78 <+248>:   ret
   0x0000007ff7aaeb7c <+252>:   nop
   0x0000007ff7aaeb80 <+256>:   ldp     x12, x13, [x1]
   0x0000007ff7aaeb84 <+260>:   and     x14, x0, #0xf
   0x0000007ff7aaeb88 <+264>:   and     x3, x0, #0xfffffffffffffff0
   0x0000007ff7aaeb8c <+268>:   sub     x1, x1, x14
   0x0000007ff7aaeb90 <+272>:   add     x2, x2, x14
   0x0000007ff7aaeb94 <+276>:   ldp     x6, x7, [x1, #16]
=> 0x0000007ff7aaeb98 <+280>:   stp     x12, x13, [x0]
   0x0000007ff7aaeb9c <+284>:   ldp     x8, x9, [x1, #32]
   0x0000007ff7aaeba0 <+288>:   ldp     x10, x11, [x1, #48]
   0x0000007ff7aaeba4 <+292>:   ldp     x12, x13, [x1, #64]!
   0x0000007ff7aaeba8 <+296>:   subs    x2, x2, #0x90
   0x0000007ff7aaebac <+300>:   b.ls    0x7ff7aaebd8 <__memcpy_generic+344>  // b.plast
   0x0000007ff7aaebb0 <+304>:   stp     x6, x7, [x3, #16]
   0x0000007ff7aaebb4 <+308>:   ldp     x6, x7, [x1, #16]
   0x0000007ff7aaebb8 <+312>:   stp     x8, x9, [x3, #32]
   0x0000007ff7aaebbc <+316>:   ldp     x8, x9, [x1, #32]
   0x0000007ff7aaebc0 <+320>:   stp     x10, x11, [x3, #48]
   0x0000007ff7aaebc4 <+324>:   ldp     x10, x11, [x1, #48]
   0x0000007ff7aaebc8 <+328>:   stp     x12, x13, [x3, #64]!
   0x0000007ff7aaebcc <+332>:   ldp     x12, x13, [x1, #64]!
   0x0000007ff7aaebd0 <+336>:   subs    x2, x2, #0x40
   0x0000007ff7aaebd4 <+340>:   b.hi    0x7ff7aaebb0 <__memcpy_generic+304>  // b.pmore
   0x0000007ff7aaebd8 <+344>:   ldp     x14, x15, [x4, #-64]
   0x0000007ff7aaebdc <+348>:   stp     x6, x7, [x3, #16]
   0x0000007ff7aaebe0 <+352>:   ldp     x6, x7, [x4, #-48]
   0x0000007ff7aaebe4 <+356>:   stp     x8, x9, [x3, #32]
   0x0000007ff7aaebe8 <+360>:   ldp     x8, x9, [x4, #-32]
   0x0000007ff7aaebec <+364>:   stp     x10, x11, [x3, #48]
   0x0000007ff7aaebf0 <+368>:   ldp     x10, x11, [x4, #-16]
   0x0000007ff7aaebf4 <+372>:   stp     x12, x13, [x3, #64]
   0x0000007ff7aaebf8 <+376>:   stp     x14, x15, [x5, #-64]
   0x0000007ff7aaebfc <+380>:   stp     x6, x7, [x5, #-48]
   0x0000007ff7aaec00 <+384>:   stp     x8, x9, [x5, #-32]
   0x0000007ff7aaec04 <+388>:   stp     x10, x11, [x5, #-16]
   0x0000007ff7aaec08 <+392>:   ret
End of assembler dump.
jcdutton commented 2 years ago

The output of the disassemble command show the next instruction it is about to execute: 0x0000007ff7aaeb8c <+268>: sub x1, x1, x14 0x0000007ff7aaeb90 <+272>: add x2, x2, x14 0x0000007ff7aaeb94 <+276>: ldp x6, x7, [x1, #16] => 0x0000007ff7aaeb98 <+280>: stp x12, x13, [x0] 0x0000007ff7aaeb9c <+284>: ldp x8, x9, [x1, #32] 0x0000007ff7aaeba0 <+288>: ldp x10, x11, [x1, #48] 0x0000007ff7aaeba4 <+292>: ldp x12, x13, [x1, #64]!

So, this means that the "bus error" happened at a load instruction. Pretty much all the instructions in memcpy copy 8 bytes at a time, in pairs. so ldp x6,x7 [x1, 16] loads 8 bytes into x6, then 8 bytes into x7. I.e. a pair of 8 byte numbers. So, none of them are 32bit operations. Please post the output of x1 and x0 at this point. i r x0 i r x1 The value of x1 should be within 16 (0x10) bytes of the bus error address.

The fix is probably to find out why the source, (data) needs to be aligned. One can also not simply replace memcpy with a byte at a time copy, because the compiler will just optimize it up to 8 byte transfers.

pgwipeout commented 2 years ago
Thread 1 "glmark2-drm" received signal SIGBUS, Bus error.
__memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173
173     ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory.
(gdb) i r x0
x0             0x7fecda8cac        549434592428
(gdb) i r x1
x1             0x5556268180        366517584256
(gdb)
pgwipeout commented 2 years ago

I just realized the context got blown out cause I lost power for a minute. So here's the new context.

(gdb) disassemble
Dump of assembler code for function __memcpy_generic:
   0x0000007ff7aaea80 <+0>:     nop
   0x0000007ff7aaea84 <+4>:     add     x4, x1, x2
   0x0000007ff7aaea88 <+8>:     add     x5, x0, x2
   0x0000007ff7aaea8c <+12>:    cmp     x2, #0x80
   0x0000007ff7aaea90 <+16>:    b.hi    0x7ff7aaeb80 <__memcpy_generic+256>  // b.pmore
   0x0000007ff7aaea94 <+20>:    cmp     x2, #0x20
   0x0000007ff7aaea98 <+24>:    b.hi    0x7ff7aaeb10 <__memcpy_generic+144>  // b.pmore
   0x0000007ff7aaea9c <+28>:    cmp     x2, #0x10
   0x0000007ff7aaeaa0 <+32>:    b.cc    0x7ff7aaeab8 <__memcpy_generic+56>  // b.lo, b.ul, b.last
   0x0000007ff7aaeaa4 <+36>:    ldp     x6, x7, [x1]
   0x0000007ff7aaeaa8 <+40>:    ldp     x12, x13, [x4, #-16]
   0x0000007ff7aaeaac <+44>:    stp     x6, x7, [x0]
   0x0000007ff7aaeab0 <+48>:    stp     x12, x13, [x5, #-16]
   0x0000007ff7aaeab4 <+52>:    ret
   0x0000007ff7aaeab8 <+56>:    tbz     w2, #3, 0x7ff7aaead0 <__memcpy_generic+80>
   0x0000007ff7aaeabc <+60>:    ldr     x6, [x1]
   0x0000007ff7aaeac0 <+64>:    ldur    x7, [x4, #-8]
   0x0000007ff7aaeac4 <+68>:    str     x6, [x0]
   0x0000007ff7aaeac8 <+72>:    stur    x7, [x5, #-8]
   0x0000007ff7aaeacc <+76>:    ret
   0x0000007ff7aaead0 <+80>:    tbz     w2, #2, 0x7ff7aaeae8 <__memcpy_generic+104>
   0x0000007ff7aaead4 <+84>:    ldr     w6, [x1]
   0x0000007ff7aaead8 <+88>:    ldur    w8, [x4, #-4]
   0x0000007ff7aaeadc <+92>:    str     w6, [x0]
   0x0000007ff7aaeae0 <+96>:    stur    w8, [x5, #-4]
   0x0000007ff7aaeae4 <+100>:   ret
   0x0000007ff7aaeae8 <+104>:   cbz     x2, 0x7ff7aaeb08 <__memcpy_generic+136>
   0x0000007ff7aaeaec <+108>:   lsr     x14, x2, #1
   0x0000007ff7aaeaf0 <+112>:   ldrb    w6, [x1]
   0x0000007ff7aaeaf4 <+116>:   ldurb   w10, [x4, #-1]
   0x0000007ff7aaeaf8 <+120>:   ldrb    w8, [x1, x14]
   0x0000007ff7aaeafc <+124>:   strb    w6, [x0]
   0x0000007ff7aaeb00 <+128>:   strb    w8, [x0, x14]
   0x0000007ff7aaeb04 <+132>:   sturb   w10, [x5, #-1]
   0x0000007ff7aaeb08 <+136>:   ret
   0x0000007ff7aaeb0c <+140>:   nop
   0x0000007ff7aaeb10 <+144>:   ldp     x6, x7, [x1]
   0x0000007ff7aaeb14 <+148>:   ldp     x8, x9, [x1, #16]
   0x0000007ff7aaeb18 <+152>:   ldp     x10, x11, [x4, #-32]
   0x0000007ff7aaeb1c <+156>:   ldp     x12, x13, [x4, #-16]
   0x0000007ff7aaeb20 <+160>:   cmp     x2, #0x40
   0x0000007ff7aaeb24 <+164>:   b.hi    0x7ff7aaeb40 <__memcpy_generic+192>  // b.pmore
   0x0000007ff7aaeb28 <+168>:   stp     x6, x7, [x0]
   0x0000007ff7aaeb2c <+172>:   stp     x8, x9, [x0, #16]
   0x0000007ff7aaeb30 <+176>:   stp     x10, x11, [x5, #-32]
   0x0000007ff7aaeb34 <+180>:   stp     x12, x13, [x5, #-16]
   0x0000007ff7aaeb38 <+184>:   ret
   0x0000007ff7aaeb3c <+188>:   nop
   0x0000007ff7aaeb40 <+192>:   ldp     x14, x15, [x1, #32]
   0x0000007ff7aaeb44 <+196>:   ldp     x16, x17, [x1, #48]
   0x0000007ff7aaeb48 <+200>:   cmp     x2, #0x60
   0x0000007ff7aaeb4c <+204>:   b.ls    0x7ff7aaeb60 <__memcpy_generic+224>  // b.plast
   0x0000007ff7aaeb50 <+208>:   ldp     x2, x3, [x4, #-64]
   0x0000007ff7aaeb54 <+212>:   ldp     x1, x4, [x4, #-48]
   0x0000007ff7aaeb58 <+216>:   stp     x2, x3, [x5, #-64]
   0x0000007ff7aaeb5c <+220>:   stp     x1, x4, [x5, #-48]
   0x0000007ff7aaeb60 <+224>:   stp     x6, x7, [x0]
   0x0000007ff7aaeb64 <+228>:   stp     x8, x9, [x0, #16]
   0x0000007ff7aaeb68 <+232>:   stp     x14, x15, [x0, #32]
   0x0000007ff7aaeb6c <+236>:   stp     x16, x17, [x0, #48]
   0x0000007ff7aaeb70 <+240>:   stp     x10, x11, [x5, #-32]
--Type <RET> for more, q to quit, c to continue without paging--c
   0x0000007ff7aaeb74 <+244>:   stp     x12, x13, [x5, #-16]
   0x0000007ff7aaeb78 <+248>:   ret
   0x0000007ff7aaeb7c <+252>:   nop
   0x0000007ff7aaeb80 <+256>:   ldp     x12, x13, [x1]
   0x0000007ff7aaeb84 <+260>:   and     x14, x0, #0xf
   0x0000007ff7aaeb88 <+264>:   and     x3, x0, #0xfffffffffffffff0
   0x0000007ff7aaeb8c <+268>:   sub     x1, x1, x14
   0x0000007ff7aaeb90 <+272>:   add     x2, x2, x14
   0x0000007ff7aaeb94 <+276>:   ldp     x6, x7, [x1, #16]
=> 0x0000007ff7aaeb98 <+280>:   stp     x12, x13, [x0]
   0x0000007ff7aaeb9c <+284>:   ldp     x8, x9, [x1, #32]
   0x0000007ff7aaeba0 <+288>:   ldp     x10, x11, [x1, #48]
   0x0000007ff7aaeba4 <+292>:   ldp     x12, x13, [x1, #64]!
   0x0000007ff7aaeba8 <+296>:   subs    x2, x2, #0x90
   0x0000007ff7aaebac <+300>:   b.ls    0x7ff7aaebd8 <__memcpy_generic+344>  // b.plast
   0x0000007ff7aaebb0 <+304>:   stp     x6, x7, [x3, #16]
   0x0000007ff7aaebb4 <+308>:   ldp     x6, x7, [x1, #16]
   0x0000007ff7aaebb8 <+312>:   stp     x8, x9, [x3, #32]
   0x0000007ff7aaebbc <+316>:   ldp     x8, x9, [x1, #32]
   0x0000007ff7aaebc0 <+320>:   stp     x10, x11, [x3, #48]
   0x0000007ff7aaebc4 <+324>:   ldp     x10, x11, [x1, #48]
   0x0000007ff7aaebc8 <+328>:   stp     x12, x13, [x3, #64]!
   0x0000007ff7aaebcc <+332>:   ldp     x12, x13, [x1, #64]!
   0x0000007ff7aaebd0 <+336>:   subs    x2, x2, #0x40
   0x0000007ff7aaebd4 <+340>:   b.hi    0x7ff7aaebb0 <__memcpy_generic+304>  // b.pmore
   0x0000007ff7aaebd8 <+344>:   ldp     x14, x15, [x4, #-64]
   0x0000007ff7aaebdc <+348>:   stp     x6, x7, [x3, #16]
   0x0000007ff7aaebe0 <+352>:   ldp     x6, x7, [x4, #-48]
   0x0000007ff7aaebe4 <+356>:   stp     x8, x9, [x3, #32]
   0x0000007ff7aaebe8 <+360>:   ldp     x8, x9, [x4, #-32]
   0x0000007ff7aaebec <+364>:   stp     x10, x11, [x3, #48]
   0x0000007ff7aaebf0 <+368>:   ldp     x10, x11, [x4, #-16]
   0x0000007ff7aaebf4 <+372>:   stp     x12, x13, [x3, #64]
   0x0000007ff7aaebf8 <+376>:   stp     x14, x15, [x5, #-64]
   0x0000007ff7aaebfc <+380>:   stp     x6, x7, [x5, #-48]
   0x0000007ff7aaec00 <+384>:   stp     x8, x9, [x5, #-32]
   0x0000007ff7aaec04 <+388>:   stp     x10, x11, [x5, #-16]
   0x0000007ff7aaec08 <+392>:   ret
End of assembler dump.
(gdb)
jcdutton commented 2 years ago

Looking at your posts:

0x0000007ff7aaeb90 <+272>: add x2, x2, x14 0x0000007ff7aaeb94 <+276>: ldp x6, x7, [x1, #16] => 0x0000007ff7aaeb98 <+280>: stp x12, x13, [x0] 0x0000007ff7aaeb9c <+284>: ldp x8, x9, [x1, #32]

Thread 1 "glmark2-drm" received signal SIGBUS, Bus error. __memcpy_generic () at ../sysdeps/aarch64/multiarch/../memcpy.S:173 173 ../sysdeps/aarch64/multiarch/../memcpy.S: No such file or directory. (gdb) i r x0 x0 0x7fecda8cac 549434592428 (gdb) i r x1 x1 0x5556268180 366517584256 (gdb)

I now think that I was wrong above, the problem instruction is the store. Here x0 is not correctly aligned,but x1 is correctly aligned. So the problem is around the x0. 0x...cac is 12 bytes into a 16 byte alignment. or 4 bytes into a 8 byte alignment. In any case trying to write 8 bytes there is going to need to wrap into the next 8 bytes, and thus bus error.

This points to a bug in the memcpy assembler instructions in memcpy.S, but I think that is very unlikely, because this code runs fine on all sorts of arm64 machines.

Maybe replacing the instruction: stp x12, x13, [x0] with 16 stb instructions is all that is needed for a quick fix.

I say very unlikely, but someone else has found other bugs with the aarch64 memcpy recently here: https://gist.github.com/jnettlet/f6f8b49bb7c731255c46f541f875f436

So, maybe we have found a new bug here also.

Coreforge commented 2 years ago

Just trying it with a little bit of VRAM mmapped, the Bus error occurs when either reading or writing using memcpy with addresses that aren't 8 byte aligned. While it could be different on the SOQuartz, it's pretty unlikely.

jcdutton commented 2 years ago

Although one might think this is a bug, the comment at the beginning of the memcpy.S code says: / Assumptions:

So, we need a version of memcpy that forces aligned accesses, at least on write, like the x86 code does. We could do a quick fix here, and replace the "stp x12, x13, [x0]" with 16 strb instructions together with some register shifting.

Coreforge commented 2 years ago

I made a version of memcpy that just copies a single byte at a time, and that can copy data around to and from VRAM without any bus errors. Trying to put that into a library right now that can just be loaded with LD_PRELOAD.

Coreforge commented 2 years ago

Preloading the library with LD_PRELOAD works at least with my small test programm. https://gist.github.com/Coreforge/91da3d410ec7eb0ef5bc8dee24b91359 I compiled it using gcc -shared -fPIC -o memcpy.so memcpy_unaligned.c

Coreforge commented 2 years ago

replacing memcpy doesn't seem to help with glmark. It still crashes on the buffer test with Bus error after rendering one frame.

jcdutton commented 2 years ago

@Coreforge When implementing memcpy, one generally has to do an associated memmove as well. I one byte at a time, over PCIe is going to be slow. It will be about 400% faster if you copy 32bits at a time. So, do a few 1 byte copies at the start, and then use 32bits(aligned) for the rest, with a small 1 byte tail at the end if needed.

jcdutton commented 2 years ago

Do objdump on the compiled memcpy, the compiler probably created 64 bit accesses when optimizing.

Coreforge commented 2 years ago

Since I cast the src and dst pointers to volatile chars, it should only to 8 bit accesses. While I don't know assembly very well, to me it looks like it's also only doing 8 bit at a time.

memcpy.so:     file format elf64-littleaarch64

Disassembly of section .init:

0000000000000500 <_init>:
 500:   a9bf7bfd    stp x29, x30, [sp, #-16]!
 504:   910003fd    mov x29, sp
 508:   94000022    bl  590 <call_weak_fn>
 50c:   a8c17bfd    ldp x29, x30, [sp], #16
 510:   d65f03c0    ret

Disassembly of section .plt:

0000000000000520 <.plt>:
 520:   a9bf7bf0    stp x16, x30, [sp, #-16]!
 524:   90000090    adrp    x16, 10000 <__FRAME_END__+0xf7e0>
 528:   f947fe11    ldr x17, [x16, #4088]
 52c:   913fe210    add x16, x16, #0xff8
 530:   d61f0220    br  x17
 534:   d503201f    nop
 538:   d503201f    nop
 53c:   d503201f    nop

0000000000000540 <__cxa_finalize@plt>:
 540:   b0000090    adrp    x16, 11000 <__cxa_finalize@GLIBC_2.17>
 544:   f9400211    ldr x17, [x16]
 548:   91000210    add x16, x16, #0x0
 54c:   d61f0220    br  x17

0000000000000550 <malloc@plt>:
 550:   b0000090    adrp    x16, 11000 <__cxa_finalize@GLIBC_2.17>
 554:   f9400611    ldr x17, [x16, #8]
 558:   91002210    add x16, x16, #0x8
 55c:   d61f0220    br  x17

0000000000000560 <memcpy@plt>:
 560:   b0000090    adrp    x16, 11000 <__cxa_finalize@GLIBC_2.17>
 564:   f9400a11    ldr x17, [x16, #16]
 568:   91004210    add x16, x16, #0x10
 56c:   d61f0220    br  x17

0000000000000570 <__gmon_start__@plt>:
 570:   b0000090    adrp    x16, 11000 <__cxa_finalize@GLIBC_2.17>
 574:   f9400e11    ldr x17, [x16, #24]
 578:   91006210    add x16, x16, #0x18
 57c:   d61f0220    br  x17

0000000000000580 <free@plt>:
 580:   b0000090    adrp    x16, 11000 <__cxa_finalize@GLIBC_2.17>
 584:   f9401211    ldr x17, [x16, #32]
 588:   91008210    add x16, x16, #0x20
 58c:   d61f0220    br  x17

Disassembly of section .text:

0000000000000590 <call_weak_fn>:
 590:   90000080    adrp    x0, 10000 <__FRAME_END__+0xf7e0>
 594:   f947ec00    ldr x0, [x0, #4056]
 598:   b4000040    cbz x0, 5a0 <call_weak_fn+0x10>
 59c:   17fffff5    b   570 <__gmon_start__@plt>
 5a0:   d65f03c0    ret
 5a4:   d503201f    nop

00000000000005a8 <deregister_tm_clones>:
 5a8:   b0000080    adrp    x0, 11000 <__cxa_finalize@GLIBC_2.17>
 5ac:   9100c000    add x0, x0, #0x30
 5b0:   b0000081    adrp    x1, 11000 <__cxa_finalize@GLIBC_2.17>
 5b4:   9100c021    add x1, x1, #0x30
 5b8:   eb00003f    cmp x1, x0
 5bc:   540000a0    b.eq    5d0 <deregister_tm_clones+0x28>  // b.none
 5c0:   90000081    adrp    x1, 10000 <__FRAME_END__+0xf7e0>
 5c4:   f947e421    ldr x1, [x1, #4040]
 5c8:   b4000041    cbz x1, 5d0 <deregister_tm_clones+0x28>
 5cc:   d61f0020    br  x1
 5d0:   d65f03c0    ret
 5d4:   d503201f    nop

00000000000005d8 <register_tm_clones>:
 5d8:   b0000080    adrp    x0, 11000 <__cxa_finalize@GLIBC_2.17>
 5dc:   9100c000    add x0, x0, #0x30
 5e0:   b0000081    adrp    x1, 11000 <__cxa_finalize@GLIBC_2.17>
 5e4:   9100c021    add x1, x1, #0x30
 5e8:   cb000021    sub x1, x1, x0
 5ec:   9343fc21    asr x1, x1, #3
 5f0:   8b41fc21    add x1, x1, x1, lsr #63
 5f4:   9341fc21    asr x1, x1, #1
 5f8:   b40000a1    cbz x1, 60c <register_tm_clones+0x34>
 5fc:   90000082    adrp    x2, 10000 <__FRAME_END__+0xf7e0>
 600:   f947f042    ldr x2, [x2, #4064]
 604:   b4000042    cbz x2, 60c <register_tm_clones+0x34>
 608:   d61f0040    br  x2
 60c:   d65f03c0    ret

0000000000000610 <__do_global_dtors_aux>:
 610:   a9be7bfd    stp x29, x30, [sp, #-32]!
 614:   910003fd    mov x29, sp
 618:   f9000bf3    str x19, [sp, #16]
 61c:   b0000093    adrp    x19, 11000 <__cxa_finalize@GLIBC_2.17>
 620:   3940c260    ldrb    w0, [x19, #48]
 624:   35000140    cbnz    w0, 64c <__do_global_dtors_aux+0x3c>
 628:   90000080    adrp    x0, 10000 <__FRAME_END__+0xf7e0>
 62c:   f947e800    ldr x0, [x0, #4048]
 630:   b4000080    cbz x0, 640 <__do_global_dtors_aux+0x30>
 634:   b0000080    adrp    x0, 11000 <__cxa_finalize@GLIBC_2.17>
 638:   f9401400    ldr x0, [x0, #40]
 63c:   97ffffc1    bl  540 <__cxa_finalize@plt>
 640:   97ffffda    bl  5a8 <deregister_tm_clones>
 644:   52800020    mov w0, #0x1                    // #1
 648:   3900c260    strb    w0, [x19, #48]
 64c:   f9400bf3    ldr x19, [sp, #16]
 650:   a8c27bfd    ldp x29, x30, [sp], #32
 654:   d65f03c0    ret

0000000000000658 <frame_dummy>:
 658:   17ffffe0    b   5d8 <register_tm_clones>

000000000000065c <memcpy>:
 65c:   d10103ff    sub sp, sp, #0x40
 660:   f9000fe0    str x0, [sp, #24]
 664:   f9000be1    str x1, [sp, #16]
 668:   f90007e2    str x2, [sp, #8]
 66c:   f9400be0    ldr x0, [sp, #16]
 670:   f9001be0    str x0, [sp, #48]
 674:   f9400fe0    ldr x0, [sp, #24]
 678:   f90017e0    str x0, [sp, #40]
 67c:   f9001fff    str xzr, [sp, #56]
 680:   1400000d    b   6b4 <memcpy+0x58>
 684:   f9401be1    ldr x1, [sp, #48]
 688:   f9401fe0    ldr x0, [sp, #56]
 68c:   8b000021    add x1, x1, x0
 690:   f94017e2    ldr x2, [sp, #40]
 694:   f9401fe0    ldr x0, [sp, #56]
 698:   8b000040    add x0, x2, x0
 69c:   39400021    ldrb    w1, [x1]
 6a0:   12001c21    and w1, w1, #0xff
 6a4:   39000001    strb    w1, [x0]
 6a8:   f9401fe0    ldr x0, [sp, #56]
 6ac:   91000400    add x0, x0, #0x1
 6b0:   f9001fe0    str x0, [sp, #56]
 6b4:   f9401fe1    ldr x1, [sp, #56]
 6b8:   f94007e0    ldr x0, [sp, #8]
 6bc:   eb00003f    cmp x1, x0
 6c0:   54fffe23    b.cc    684 <memcpy+0x28>  // b.lo, b.ul, b.last
 6c4:   f9400fe0    ldr x0, [sp, #24]
 6c8:   910103ff    add sp, sp, #0x40
 6cc:   d65f03c0    ret

00000000000006d0 <memmove>:
 6d0:   a9bc7bfd    stp x29, x30, [sp, #-64]!
 6d4:   910003fd    mov x29, sp
 6d8:   f90017e0    str x0, [sp, #40]
 6dc:   f90013e1    str x1, [sp, #32]
 6e0:   f9000fe2    str x2, [sp, #24]
 6e4:   f9400fe0    ldr x0, [sp, #24]
 6e8:   97ffff9a    bl  550 <malloc@plt>
 6ec:   f9001fe0    str x0, [sp, #56]
 6f0:   f9400fe2    ldr x2, [sp, #24]
 6f4:   f94013e1    ldr x1, [sp, #32]
 6f8:   f9401fe0    ldr x0, [sp, #56]
 6fc:   97ffff99    bl  560 <memcpy@plt>
 700:   f9400fe2    ldr x2, [sp, #24]
 704:   f9401fe1    ldr x1, [sp, #56]
 708:   f94017e0    ldr x0, [sp, #40]
 70c:   97ffff95    bl  560 <memcpy@plt>
 710:   f9401fe0    ldr x0, [sp, #56]
 714:   97ffff9b    bl  580 <free@plt>
 718:   d503201f    nop
 71c:   a8c47bfd    ldp x29, x30, [sp], #64
 720:   d65f03c0    ret

Disassembly of section .fini:

0000000000000724 <_fini>:
 724:   a9bf7bfd    stp x29, x30, [sp, #-16]!
 728:   910003fd    mov x29, sp
 72c:   a8c17bfd    ldp x29, x30, [sp], #16
 730:   d65f03c0    ret
jcdutton commented 2 years ago

@Coreforge Correct. The code looks ok for memcpy. The memmove looks a bit odd though. It is supposed to work differently depending on which of src and dest are bigger.

Coreforge commented 2 years ago

The memmove function just allocates a temporary array, copies the data into that array and then copies it to the destination. This isn't ideal and would fail if there isn't enough memory available, but it works for now. I initially forgot to return the dst pointer from memmove though, which caused a bad_alloc during the cell shading test somewhere in std::vector stuff. With that fixed though, I was able to do a full run of glmark2 without any issues. Xorg still doesn't want to work though.