gianni-rosato / svt-av1-psy

The Scalable Video Technology for AV1 (SVT-AV1 Encoder and Decoder) with perceptual enhancements for psychovisually optimal AV1 encoding
BSD 3-Clause Clear License
226 stars 18 forks source link

[BUG] Crash when using `--preset 2` or lower. #82

Closed vi closed 1 day ago

vi commented 2 days ago

Overview

I encode video that begins with YUV4MPEG2 W1920 H1080 F30:1 Ip A1:1 C420mpeg2 XYSCSS=420MPEG2 XCOLORRANGE=LIMITED using command line like SvtAv1EncApp --passes 1 --rc 1 --crf 55 -b y.ivf -i - --keyint 2000 --preset 2 --tune 3 and it crashes. Some output is produced, but is unplayable (maybe a trimmed file).

Actually most of the options are irrelevant, it crashes even with just -b y.ivf -i - --preset 2.

Branch In which branch does the issue appear to be occurring?

When built from v2.2.1-A is does not crash.

git bisect is pointing at commit f14607b838218174a00c414d6d8b4e35de7ed2f3 where it starts crashing. Reverting this commit resolves the crash.

Terminal Output If applicable, add terminal output to help explain your problem.

Platform (please complete the following information):

Linux hostname 6.1.0-23-amd64 x86_64 unknown GNU/Linux gcc (Debian 12.2.0-14) 12.2.0

Version String (please complete the following information):

SVT-AV1-PSY v2.2.1-B (release)
PSY Release: B

Additional context / Relevant Files

Backtrace from core dump ``` Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f24df872758 in _mm256_store_si256 (__A=..., __P=0x55fef64251d0) at /usr/lib/gcc/x86_64-linux-gnu/12/include/avxintrin.h:923 923 *__P = __A; [Current thread is 1 (Thread 0x7f24831b76c0 (LWP 23369))] #0 0x00007f24df872758 in _mm256_store_si256 (__A=..., __P=0x55fef64251d0) at /usr/lib/gcc/x86_64-linux-gnu/12/include/avxintrin.h:923 No locals. #1 subtract32_avx2 (diff_ptr=0x55fef64251d0, src_ptr=0x7f249a1e5708 "\352\352\352\352\352\351", '\352' , "\353\352\352\352\353\353\352\352\352\352\353", '\352' , "\351\347\341\340\347\351\351\335ö\267\270\266\262\257\255\263\274\274\273\273\274\307ȫ\261\326\325\315ï\256\307\337ʨ\230\213\232\250\247\242\236\244\263\270\264\262\265\265\263\263\260\261\263\263\262\262\261\255\257\261\262\264\265\264\260\250\237\232\234\223\225\246\244\230\226\235\223\217\232\242\237\265\340\345ͨ\252\261\323\351\351\352\352\352\352\351\352\352\352\352\352\343\340\345ȼ\336伖\225\220\220\222\221\220\224\235\246\244\231\216\233\273\256\220\216\221\224\224\227\235\251\260\254\244\245\247\244\250\305̰\237\232\226\232\243\257\307\335\350\352\352", ..., pred_ptr=0x55fef641e160 '\352' , "\351", '\352' , "\353\353\353\353", '\352' , "\353\353\353\353\350", '\352' , "\353\353\353\353\353\317\327\345", '\352' , "\353\353\353\353\353몽\336\352\351\352\352", ...) at /home/vi/src/svt-av1-psy/Source/Lib/ASM_AVX2/convolve_avx2.c:1509 s = {-1519144729111237910, -1519143629599610134, -1518861055111271446, -1519143629582832918} p = {-1519143629599610134, -1519143629599610134, -1519143629599610134, -1519143629599610134} set_one_minusone = {-71495735022846207, -71495735022846207, -71495735022846207, -71495735022846207} diff0 = {0, 4294901760, 65536, 4295032832} diff1 = {0, 0, 281474976710656, 0} #2 0x00007f24df87291d in aom_subtract_block_32xn_avx2 (rows=32, diff_ptr=0x55fef64251d0, diff_stride=32, src_ptr=0x7f249a1e5708 "\352\352\352\352\352\351", '\352' , "\353\352\352\352\353\353\352\352\352\352\353", '\352' , "\351\347\341\340\347\351\351\335ö\267\270\266\262\257\255\263\274\274\273\273\274\307ȫ\261\326\325\315ï\256\307\337ʨ\230\213\232\250\247\242\236\244\263\270\264\262\265\265\263\263\260\261\263\263\262\262\261\255\257\261\262\264\265\264\260\250\237\232\234\223\225\246\244\230\226\235\223\217\232\242\237\265\340\345ͨ\252\261\323\351\351\352\352\352\352\351\352\352\352\352\352\343\340\345ȼ\336伖\225\220\220\222\221\220\224\235\246\244\231\216\233\273\256\220\216\221\224\224\227\235\251\260\254\244\245\247\244\250\305̰\237\232\226\232\243\257\307\335\350\352\352", ..., src_stride=2064, pred_ptr=0x55fef641e160 '\352' , "\351", '\352' , "\353\353\353\353", '\352' , "\353\353\353\353\350", '\352' , "\353\353\353\353\353\317\327\345", '\352' , "\353\353\353\353\353몽\336\352\351\352\352", ..., pred_stride=32) at /home/vi/src/svt-av1-psy/Source/Lib/ASM_AVX2/convolve_avx2.c:1535 j = 0 #3 0x00007f24df872b6d in svt_aom_subtract_block_avx2 (rows=32, cols=32, diff_ptr=0x55fef64251d0, diff_stride=32, src_ptr=0x7f249a1e5708 "\352\352\352\352\352\351", '\352' , "\353\352\352\352\353\353\352\352\352\352\353", '\352' , "\351\347\341\340\347\351\351\335ö\267\270\266\262\257\255\263\274\274\273\273\274\307ȫ\261\326\325\315ï\256\307\337ʨ\230\213\232\250\247\242\236\244\263\270\264\262\265\265\263\263\260\261\263\263\262\262\261\255\257\261\262\264\265\264\260\250\237\232\234\223\225\246\244\230\226\235\223\217\232\242\237\265\340\345ͨ\252\261\323\351\351\352\352\352\352\351\352\352\352\352\352\343\340\345ȼ\336伖\225\220\220\222\221\220\224\235\246\244\231\216\233\273\256\220\216\221\224\224\227\235\251\260\254\244\245\247\244\250\305̰\237\232\226\232\243\257\307\335\350\352\352", ..., src_stride=2064, pred_ptr=0x55fef641e160 '\352' , "\351", '\352' , "\353\353\353\353", '\352' , "\353\353\353\353\350", '\352' , "\353\353\353\353\353\317\327\345", '\352' , "\353\353\353\353\353몽\336\352\351\352\352", ..., pred_stride=32) at /home/vi/src/svt-av1-psy/Source/Lib/ASM_AVX2/convolve_avx2.c:1572 No locals. #4 0x00007f24df276507 in svt_aom_calc_pred_masked_compound (pcs=0x7f24d4197010, ctx=0x55fef641a410, cand=0x7f24881907c0) at /home/vi/src/svt-av1-psy/Source/Lib/Codec/enc_inter_prediction.c:4715 src_buf = 0x7f249a1e5708 "\352\352\352\352\352\351", '\352' , "\353\352\352\352\353\353\352\352\352\352\353", '\352' , "\351\347\341\340\347\351\351\335ö\267\270\266\262\257\255\263\274\274\273\273\274\307ȫ\261\326\325\315ï\256\307\337ʨ\230\213\232\250\247\242\236\244\263\270\264\262\265\265\263\263\260\261\263\263\262\262\261\255\257\261\262\264\265\264\260\250\237\232\234\223\225\246\244\230\226\235\223\217\232\242\237\265\340\345ͨ\252\261\323\351\351\352\352\352\352\351\352\352\352\352\352\343\340\345ȼ\336伖\225\220\220\222\221\220\224\235\246\244\231\216\233\273\256\220\216\221\224\224\227\235\251\260\254\244\245\247\244\250\305̰\237\232\226\232\243\257\307\335\350\352\352", ... scs = 0x7f24decff010 hbd_md = 0 '\000' src_pic = 0x55fef23bddf0 bwidth = 32 bheight = 32 pred_desc = {dctor = 0x55fef2090530, buffer_y = 0x55fef641e160 '\352' , "\351", '\352' , "\353\353\353\353", '\352' , "\353\353\353\353\350", '\352' , "\353\353\353\353\353\317\327\345", '\352' , "\353\353\353\353\353몽\336\352\351\352\352", ..., buffer_cb = 0x5501be6bfa00 , buffer_cr = 0x7f24831b6abc "Q", buffer_bit_inc_y = 0x55fef641a410 "Ŷ2\337$\177", buffer_bit_inc_cb = 0x7f24d4197010 "\337\0306\337$\177", buffer_bit_inc_cr = 0x101080008a110 , stride_y = 32, stride_cb = 33563, stride_cr = 32548, stride_bit_inc_y = 0, stride_bit_inc_cb = 27232, stride_bit_inc_cr = 33563, org_x = 0, org_y = 0, origin_bot_y = 9131, width = 57137, height = 32548, max_width = 0, max_height = 1632, bit_depth = 32512, color_format = 4131496961, luma_size = 21760, chroma_size = 65280, packed_flag = 11 '\v', film_grain_flag = 255 '\377', buffer_enable_mask = 4131496976, is_16bit_pipeline = 254 '\376'} ref_pic_list0 = 0x55fef2090530 ref_pic_list1 = 0x55fef2090530 mv_0 = {{x = -2, y = -2}, as_int = 4294901758} mv_1 = {{x = -2, y = 0}, as_int = 65534} mv_unit = {mv = {{{x = -2, y = -2}, as_int = 4294901758}, {{x = -2, y = 0}, as_int = 65534}}, pred_direction = 1 '\001'} rf = "\001\005" ref_idx_l0 = 0 '\000' ref_idx_l1 = 0 '\000' list_idx0 = 0 '\000' list_idx1 = 1 '\001' __PRETTY_FUNCTION__ = "svt_aom_calc_pred_masked_compound" found_l0 = false found_l1 = false exit_compound_prep = 0 '\000' pred0_to_pred1_dist = 2084 #5 0x00007f24df3174c2 in inject_mvp_candidates_ii (scs=0x7f24decff010, pcs=0x7f24d4197010, ctx=0x55fef641a410, candTotCnt=0x7f24831b6b84) at /home/vi/src/svt-av1-psy/Source/Lib/Codec/mode_decision.c:1768 cur_type = 2 '\002' is_skip_mode = false mask_done = 0 '\000' to_inject_mv_x_l0 = -2 to_inject_mv_y_l1 = 0 to_inj_mv1 = {{x = -2, y = 0}, as_int = 65534} cap_max_drl_index = 0 '\000' tot_comp_types = 4 '\004' to_inject_mv_y_l0 = -2 to_inject_mv_x_l1 = -2 to_inj_mv0 = {{x = -2, y = -2}, as_int = 4294901758} ref_idx_1 = 0 '\000' list_idx_1 = 1 '\001' ref_idx_0 = 0 '\000' list_idx_0 = 0 '\000' is_low_cmplxity = 1 ref_pair = 8 '\b' rf = "\001\005" ref_it = 2 blk_ptr = 0x7f24881e9760 frm_hdr = 0x55febe69e3c0 allow_compound = 1 '\001' inj_mv = 1 '\001' cand_idx = 81 cand_array = 0x7f248818d010 xd = 0x7f24881795d8 drli = 1 '\001' max_drl_index = 1 '\001' nearestmv = {{as_int = 4294836224, as_mv = {row = 0, col = -2}}, {as_int = 4131496976, as_mv = {row = -23536, col = -2495}}} nearmv = {{as_int = 0, as_mv = {row = 0, col = 0}}, {as_int = 3558436880, as_mv = {row = 28688, col = -11239}}} ref_mv = {{as_int = 4294836224, as_mv = {row = 0, col = -2}}, {as_int = 4131496976, as_mv = {row = -23536, col = -2495}}} inside_tile = 1 umv0tile = 0 mi_row = 8 mi_col = 0 bsize = BLOCK_32X32 is_blk_flat = 0 __PRETTY_FUNCTION__ = "inject_mvp_candidates_ii" #6 0x00007f24df31f9f5 in svt_aom_inject_inter_candidates (pcs=0x7f24d4197010, ctx=0x55fef641a410, scs=0x7f24decff010, candidate_total_cnt=0x7f24831b6be0) at /home/vi/src/svt-av1-psy/Source/Lib/Codec/mode_decision.c:3542 frm_hdr = 0x55febe69e3c0 cand_total_cnt = 66 is_compound_enabled = 1 '\001' allow_bipred = 1 '\001' mi_row = 8 mi_col = 0 is_obmc_allowed = 1 '\001' #7 0x00007f24df322bd4 in generate_md_stage_0_cand (pcs=0x7f24d4197010, ctx=0x55fef641a410, candidate_total_count_ptr=0x7f24831b6c40) at /home/vi/src/svt-av1-psy/Source/Lib/Codec/mode_decision.c:4237 scs = 0x7f24decff010 slice_type = B_SLICE cand_total_cnt = 66 dc_cand_only_flag = 0 '\000' merge_inter_cands = 206 #8 0x00007f24df3acc2a in md_encode_block (pcs=0x7f24d4197010, ctx=0x55fef641a410, sb_addr=0, input_pic=0x55fef23bddf0) at /home/vi/src/svt-av1-psy/Source/Lib/Codec/product_coding_loop.c:9085 cand_bf_ptr_array_base = 0x55fef659eed0 cand_bf_ptr_array = 0x55fef659eed0 blk_geom = 0x7f24e84ae8c0 loc = {input_origin_index = 214728, input_cb_origin_in_index = 53700, blk_origin_index = 2048, blk_chroma_origin_index = 512} blk_ptr = 0x7f24881e9760 fast_candidate_total_count = 1292 cand_class_it = CAND_CLASS_0 buffer_start_idx = 3897113024 buffer_count_for_curr_class = 32548 buffer_total_count = 32548 best_md_stage_cost = 139794879438864 best_md_stage_dist = 94553541556004 __PRETTY_FUNCTION__ = "md_encode_block" org_hbd = 0 '\000' perform_md_recon = 0 '\000' candidate_index = 3744238856 cand_bf = 0x7f24831b6d00 #9 0x00007f24df3b151d in process_block (pcs=0x7f24d4197010, ctx=0x55fef641a410, leaf_data_ptr=0x55fef64b14e0, in_pic=0x55fef23bddf0) at /home/vi/src/svt-av1-psy/Source/Lib/Codec/product_coding_loop.c:10205 blk_geom = 0x7f24e84ae8c0 blk_ptr = 0x7f24881e9760 skip_processing_block = false #10 0x00007f24df3b2bb4 in svt_aom_mode_decision_sb (scs=0x7f24decff010, pcs=0x7f24d4197010, ctx=0x55fef641a410, mdc_sb_data=0x55fef641a478) at /home/vi/src/svt-av1-psy/Source/Lib/Codec/product_coding_loop.c:10582 md_early_exit_nsq = 1 '\001' nsi = 0 shape = PART_N shape_block_cnt = 1 '\001' blk_idx_mds = 431 shape_idx = 0 base_blk_idx_mds = 431 leaf_data_ptr = 0x55fef64b14e0 blk_split_flag = 0 '\000' copy_neigh_arrays = true blk_idx = 3 input_pic = 0x55fef23bddf0 leaf_count = 5 leaf_data_array = 0x55fef64b14b0 md_early_exit_sq = 0 '\000' next_non_skip_blk_idx_mds = 0 __PRETTY_FUNCTION__ = "svt_aom_mode_decision_sb" #11 0x00007f24df2638f7 in svt_aom_mode_decision_kernel (input_ptr=0x55fed95975a0) at /home/vi/src/svt-av1-psy/Source/Lib/Codec/enc_dec_process.c:3548 tile_group_y_sb_start = 0 tile_group_x_sb_start = 0 skip_pd_pass_0 = 0 '\000' rtc_tune = false last_sb_flag = 1 '\001' scs = 0x7f24decff010 ppcs = 0x55febe698e20 sb_size = 64 '@' sb_size_log2 = 6 '\006' pic_width_in_sb = 30 enc_dec_tasks = 0x55fed5d72020 pcs = 0x7f24d4197010 md_ctx = 0x55fef641a410 tile_group_width_in_sb = 30 thread_ctx = 0x55fed95975a0 ed_ctx = 0x55fef63ddf10 enc_dec_tasks_wrapper = 0x55fef2537950 enc_dec_results_wrapper = 0x55fef253f850 enc_dec_results = 0x55fed5fcf520 sb_ptr = 0x55fedff06da0 sb_index = 0 x_sb_index = 0 y_sb_index = 0 sb_origin_x = 0 sb_origin_y = 0 mdc_ptr = 0x55fef641a478 segment_index = 0 x_sb_start_index = 0 y_sb_start_index = 0 sb_start_index = 0 sb_segment_count = 1 sb_segment_index = 0 segment_row_index = 0 segment_band_index = 0 segment_band_size = 1 segments_ptr = 0x55fee10d7170 __PRETTY_FUNCTION__ = "svt_aom_mode_decision_kernel" #12 0x00007f24dedc9134 in start_thread (arg=) at ./nptl/pthread_create.c:442 ret = pd = out = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139794795165376, 821340099522880575, -128, 29, 140725377159152, 139794794115072, -779973418868513729, -780176337825799105}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = #13 0x00007f24dee497dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 No locals. ```
gianni-rosato commented 2 days ago

Hey! Thanks for the detailed issue report. We'll be looking into this ASAP.

BlueSwordM commented 2 days ago

Hello. We'd like to know what CPU you're running svt-av1-psy on because I am not able to reproduce this with Clang 18 and GCC 14 from Presets -1 to 12.

I have a 5900X (Zen 3) chip on the latest CachyOS kernel for reference.

If we can find what it is exactly, we'll revert the patches and try to fix them if it is down to a specific CPU architecture.

gianni-rosato commented 2 days ago

Additionally, build arguments (which are requested by the issue template) and an example video to reproduce the crash would be great to have - thanks

vi commented 2 days ago

There are no funny build arguments. It reproduces both in release and debug - as expected, as SIMD is involved.

Here is a random compile command from ninja -v:

[10/256] /usr/bin/cc -DARCH_X86_64=1 -DEN_AVX512_SUPPORT=0 -DEXCLUDE_HASH=0 -DHAVE_BUILTIN_EXPECT=1 -DHAVE_VALGRIND_H=1 -DREPRODUCIBLE_BUILDS=0 -DSAFECLIB_STR_NULL_SLACK=1 -I/home/vi/src/svt-av1-psy/. -I/home/vi/src/svt-av1-psy/Source/API -I/home/vi/src/svt-av1-psy/Source/Lib/Codec -I/home/vi/src/svt-av1-psy/Source/Lib/C_DEFAULT -I/home/vi/src/svt-av1-psy/third_party/fastfeat -I/home/vi/src/svt-av1-psy/Source/Lib/Globals -I/home/vi/src/svt-av1-psy/Source/Lib/ASM_SSE2 -I/home/vi/src/svt-av1-psy/Source/Lib/ASM_SSSE3 -I/home/vi/src/svt-av1-psy/Source/Lib/ASM_SSE4_1 -I/home/vi/src/svt-av1-psy/Source/Lib/ASM_AVX2 -I/home/vi/src/svt-av1-psy/Source/Lib/ASM_AVX512 -I/home/vi/src/svt-av1-psy/third_party/cpuinfo/include -fno-stack-clash-protection -Wall -Wextra -Wformat -Wformat-security -ggdb -fstack-protector-strong -mno-avx -O3 -DNDEBUG -fPIC -fvisibility=hidden -std=gnu99 -MD -MT Source/Lib/Codec/CMakeFiles/CODEC.dir/block_structures.c.o -MF Source/Lib/Codec/CMakeFiles/CODEC.dir/block_structures.c.o.d -o Source/Lib/Codec/CMakeFiles/CODEC.dir/block_structures.c.o -c /home/vi/src/svt-av1-psy/Source/Lib/Codec/block_structures.c
/proc/cpuinfo snippet ``` processor : 11 vendor_id : GenuineIntel cpu family : 6 model : 158 model name : Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz stepping : 13 microcode : 0xfc cpu MHz : 800.000 cache size : 12288 KB physical id : 0 siblings : 12 core id : 5 cpu cores : 6 apicid : 11 initial apicid : 11 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities vmx flags : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple shadow_vmcs pml ept_mode_based_exec bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb gds bhi bogomips : 5199.98 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ```

requested by the issue template

Maybe the template should show the method how to obtain them. After multiple cmake-gui . reconfigs it may be tricky to remember the actual set of build options. Should reports post the entire CMakeCache.txt?

gianni-rosato commented 2 days ago

Thanks for providing the build arguments. Do you have a sample video?

vi commented 2 days ago

Attached a minified sample input that crashes (when the commit is included). Actual content is removed in hex editor, everything is replaced by plain colours. If all the frames are the same colour though it does not crash, so one of the frames uses another byte value to make it different.

crashy_sample.zip

If needed, I can include a full core dump with debug symbols.

thoj commented 2 days ago

Just adding to this, I have the same issue with both Zen 2 and 4 CPUs, reverting https://github.com/gianni-rosato/svt-av1-psy/commit/f14607b838218174a00c414d6d8b4e35de7ed2f3 fixes the issue for both. It crashes with Segmentation Fault.

Edit: It crashes with the stefan_sif.y4m test video: SvtAv1EncApp -i stefan_sif.y4m -b stefan_sif.ivf --preset 2 --tune 0

Edit 2: Crashes on Zen 3 also, see last cpuinfo. Reverting f14607b fixes the crash on this CPU.

cpuinfo (3600) > > processor : 11 > vendor_id : AuthenticAMD > cpu family : 23 > model : 113 > model name : AMD Ryzen 5 3600 6-Core Processor > stepping : 0 > microcode : 0x8701033 > cpu MHz : 3273.758 > cache size : 512 KB > physical id : 0 > siblings : 12 > core id : 6 > cpu cores : 6 > apicid : 13 > initial apicid : 13 > fpu : yes > fpu_exception : yes > cpuid level : 16 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es > bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso > bogomips : 7186.33 > TLB size : 3072 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 43 bits physical, 48 bits virtual > power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14] >
cpuinfo (7950X3D) > processor : 31 > vendor_id : AuthenticAMD > cpu family : 25 > model : 97 > model name : AMD Ryzen 9 7950X3D 16-Core Processor > stepping : 2 > microcode : 0xa601206 > cpu MHz : 400.000 > cache size : 1024 KB > physical id : 0 > siblings : 32 > core id : 15 > cpu cores : 16 > apicid : 31 > initial apicid : 31 > fpu : yes > fpu_exception : yes > cpuid level : 16 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d > bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso > bogomips : 8384.51 > TLB size : 3584 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
5950X > processor : 31 > vendor_id : AuthenticAMD > cpu family : 25 > model : 33 > model name : AMD Ryzen 9 5950X 16-Core Processor > stepping : 0 > microcode : 0xa201016 > cpu MHz : 2866.800 > cache size : 512 KB > physical id : 0 > siblings : 32 > core id : 15 > cpu cores : 16 > apicid : 31 > initial apicid : 31 > fpu : yes > fpu_exception : yes > cpuid level : 16 > wp : yes > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap > bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso > bogomips : 6787.40 > TLB size : 2560 4K pages > clflush size : 64 > cache_alignment : 64 > address sizes : 48 bits physical, 48 bits virtual > power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
gianni-rosato commented 2 days ago

Thanks for your patience. This has been reverted with ee7c9d2a7b67853d982e3f06cf0253def4b6b754 in testing for now - we're going to continue to investigate the issue, and if an appropriate resolution cannot be determined, we'll revert on master as well.

BlueSwordM commented 2 days ago

I can reproduce the bug if I compile with default settings, which are just O3 and DNDEBUG in a release build. I was using aggressive optimizations in my testing before, which means the aligned stores were likely getting converted to unaligned stores, bypassing the bug.

gianni-rosato commented 1 day ago

Reverted with bb886b2936a4020c9ff2e05fd5ceb6f0ba8c8f39 - let us know if you find anything else. Thanks for the detailed issue report, we appreciate it!

gitoss commented 1 day ago

I was using aggressive optimizations in my testing before, which means the aligned stores were likely getting converted to unaligned stores, bypassing the bug.

What was/is the optimization you used to circumevent the bug? Thanks.

vi commented 1 day ago

Shall v2.2.1-C or v2.2.1-B.1 be tagged to avoid the bug affecting the last named release?

Note: the bug also affects the pre-built https://github.com/gianni-rosato/svt-av1-psy/releases/download/v2.2.1-B/SvtAv1EncApp-Linux-x86_64.tzst .