Frogging-Family / linux-tkg

linux-tkg custom kernels
GNU General Public License v2.0
1.34k stars 169 forks source link

AMDGPU broken after e5fd39c #744

Closed Pheidologeton closed 1 year ago

Pheidologeton commented 1 year ago

After this commit, amdgpu is fully broken. Resolution is 640x480 and tons of log spam in dmesg about amdgpu. My config file

# linux-TkG config file

# Linux distribution you are using, options are "Arch", "Ubuntu", "Debian", "Fedora", "Suse", "Gentoo", "Generic".
# It is automatically set to "Arch" when using PKGBUILD.
# If left empty, the script will prompt
_distro=""

# Kernel Version - Options are "5.4", and from "5.7" to "5.19"
# you can also set a specific kernel version, e.g. "6.0-rc4" or "5.10.51",
#  -> note however that a "z" too small on a "x.y.z" version may make patches fail
#     as they got adapted for newer "z" values.
_version=""

#### MISC OPTIONS ####

# External config file to use - If the given file exists in path, it will override default config (customization.cfg) - Default is ~/.config/frogminer/linux-tkg.cfg
_EXT_CONFIG_PATH=~/.config/frogminer/linux-tkg.cfg

# [Arch specific] Set to anything else than "true" to limit cleanup operations and keep source and files generated during compilation.
# Default is "true".
_NUKR="true"

# Git mirror to use to get the kernel sources, possible values are "kernel.org", "googlesource.com", "github.com" and "torvalds"
_git_mirror="kernel.org"

# Root folder where to checkout the kernel sources (linux-src-git subdir) and build
# Note: - Leave empty to use PKGBUILD's dir
#       - Start with a '/' for an absolute path in which `linux-tkg/linux-src-git/` will be created
#       - This setting can be used to set the work/build folder to a tmpfs folder
#         - Requires >= 32GB ram when building a full kernel, should work with less ram with modprobed-db
_kernel_work_folder=""

# Permanent root folder where to keep the git clone (linux-kernel.git subdir) and fetch new blobs
# Note: - Leave empty to use PKGBUILD's dir
#       - Start with a '/' for an absolute path in which `linux-tkg/linux-kernel.git/` will be created
#       - If your internet is faster than your storage, it may be wise to put this folder
#         in a tmpfs location (although it will reclone after each restart / tmpfs folder cleanup)
_kernel_source_folder=""

# Custom compiler root dirs - Leave empty to use system compilers
# Example: CUSTOM_GCC_PATH="/home/frog/PKGBUILDS/mostlyportable-gcc/gcc-mostlyportable-9.2.0"
CUSTOM_GCC_PATH=""

# Custom LLVM compiler root dirs - Leave empty to use system llvm compiler
# Example: CUSTOM_LLVM_PATH="/home/frog/PKGBUILDS/mostlyportable-llvm/llvm-mostlyportable-11.0.0"
CUSTOM_LLVM_PATH=""

# Set to true to bypass makepkg.conf and use all available threads for compilation. False will respect your makepkg.conf options.
_force_all_threads="true"

# Set to true to prevent ccache from being used and set CONFIG_GCC_PLUGINS=y (which needs to be disabled for ccache to work properly)
_noccache="true"

# Set to true to use modprobed db to clean config from unneeded modules. Speeds up compilation considerably. Requires root - https://wiki.archlinux.org/index.php/Modprobed-db
# Using this option can trigger user prompts if the config doesn't go smoothly.
# !!!! Make sure to have a well populated db !!!! - Leave empty to be asked about it at build time
_modprobeddb="false"

# modprobed-db database file location
_modprobeddb_db_path=~/.config/modprobed.db

# Set to "1" to call make menuconfig, "2" to call make nconfig, "3" to call make xconfig, before building the kernel. Set to false to disable and skip the prompt.
_menunconfig=""

# Set to true to generate a kernel config fragment from your changes in menuconfig/nconfig. Set to false to disable and skip the prompt.
_diffconfig=""

# Set to the file name where the generated config fragment should be written to. Only used if _diffconfig is active.
_diffconfig_name=""

# [install.sh: Generic and Gentoo specific] Dracut options when generating initramfs
_dracut_options="--lz4"

#### KERNEL OPTIONS ####

# Name of the default config file to use for the kernel
# Default (empty)  : "config.x86_64" from the linux-tkg-config/5.y folder.
# "running-kernel" : Picks the .config file from the currently running kernel.
#                    It is recommended to be running an official kernel before running this script, to pick off a correct .config file
# "config_hardened.x86_64" : config file for a hardened kernel, available for kernel version "5.13", "5.10" and "5.4" .
#                            To get a complete hardened setup, you have to use "cfs" as _cpusched.
# User provided value : custom user provided file, the given path should be relative to the PKGBUILD file. This enables for example to use a user stripped down .config file.
#         If the .config file isn't up to date with the chosen kernel version, any extra CONFIG_XXXX is set to its default value.
# Note: the script copies the resulting .config file as "kernelconfig.new" next to the PKGBUILD as a convenience for an eventual re-use. It gets overwritten at each run.
#       One can use "kernelconfig.new" here to always use the latest edited .config file. modprobed-db needs to be used only once for its changes to be picked up.
_configfile="running-kernel"

# Determine whether to call "olddefconfig" (default) or "oldconfig" for manual config updating interaction.
_config_updating="olddefconfig"

# Disable some non-module debugging - See PKGBUILD for the list
_debugdisable="false"

# Strip the vmlinux file after build is done. Set to anything other than "true" if you require debug headers. Default is "true"
_STRIP="true"

# LEAVE AN EMPTY VALUE TO BE PROMPTED ABOUT FOLLOWING OPTIONS AT BUILD TIME

# CPU scheduler - Options are "upds" (TkG's Undead PDS), "pds", "bmq", "muqss", "cacule", "tt", "bore" or "cfs" (kernel's default)
_cpusched=""

# Compiler to use - Options are "gcc" or "llvm".
# For advanced users.
_compiler="llvm"

# Force the use of the LLVM Integrated Assembler whether using LLVM, LTO or not.
# Set to "1" to enable.
_llvm_ias="1"

# Clang LTO mode, only available with the "llvm" compiler - options are "no", "full" or "thin".
# ! This is currently experimental and might result in an unbootable kernel - Not recommended !
# "no: do not enable LTO"
# "full: uses 1 thread for Linking, slow and uses more memory, theoretically with the highest performance gains."
# "thin: uses multiple threads, faster and uses less memory, may have a lower runtime performance than Full."
_lto_mode="full"

# Apply PREEMPT_RT patchset to the kernel.
# ! Only CFS CPU scheduler is compatible with this patchset !
# Set to "1" to enable.
_preempt_rt=""

# Forcibly apply the PREEMPT_RT patchset to the kernel, even when upstream does not officially support the kernel subversion.
# ! This will still not apply when the patch itself or linux-tkg (see _version) do not support the kernel major version - Not recommended !
# Set to "1" to enable.
_preempt_rt_force=""

# CPU sched_yield_type - Choose what sort of yield sched_yield will perform
# For PDS and MuQSS: 0: No yield. (Recommended option for gaming on PDS and MuQSS)
#                    1: Yield only to better priority/deadline tasks. (Default - can be unstable with PDS on some platforms)
#                    2: Expire timeslice and recalculate deadline. (Usually the slowest option for PDS and MuQSS, not recommended)
# For BMQ:           0: No yield.
#                    1: Deboost and requeue task. (Default)
#                    2: Set rq skip task.
_sched_yield_type=""

# Round Robin interval is the longest duration two tasks with the same nice level will be delayed for. When CPU time is requested by a task, it receives a time slice equal
# to the rr_interval in addition to a virtual deadline. When using yield_type 2, a low value can help offset the disadvantages of rescheduling a process that has yielded.
# MuQSS default: 6ms"
# PDS default: 4ms"
# BMQ default: 2ms"
# Set to "1" for 2ms, "2" for 4ms, "3" for 6ms, "4" for 8ms, or "default" to keep the chosen scheduler defaults.
_rr_interval=""

# Set to "true" to disable FUNCTION_TRACER/GRAPH_TRACER, lowering overhead but limiting debugging and analyzing of kernel functions - Kernel default is "false"
_ftracedisable="false"

# Set to "true" to disable NUMA, lowering overhead, but breaking CUDA/NvEnc on Nvidia equipped systems - Kernel default is "false"
_numadisable="false"

# Set to "true" to enable misc additions - May contain temporary fixes pending upstream or changes that can break on non-Arch - Kernel default is "true"
_misc_adds="true"

# Set to "0" for periodic ticks, "1" to use CattaRappa mode (enabling full tickless) and "2" for tickless idle only.
# Full tickless can give higher performances in case you use isolation of CPUs for tasks
# and it works only when using the nohz_full kernel parameter, otherwise behaves like idle.
# Just tickless idle perform better for most platforms.
_tickless="0"

# Set to "true" to use ACS override patch - https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Bypassing_the_IOMMU_groups_.28ACS_override_patch.29 - Kernel default is "false"
_acs_override=""

# Set to "true" to add Bcache filesystem support. You'll have to install bcachefs-tools-git from AUR for utilities - https://bcachefs.org/ - If in doubt, set to "false"
# This can be buggy and isn't recommended on a production machine, also enabling this option will not allow you to enable MGLRU.
_bcachefs="false"

# Set to "true" to enable support for winesync, an experimental replacement for esync - requires patched wine - https://repo.or.cz/linux/zf.git/shortlog/refs/heads/winesync4
# ! Can't be used on multiple kernels installed side-by-side, which will require https://aur.archlinux.org/packages/winesync-dkms/ instead of this option !
_winesync="false"

# Set to "true" to enable Binder and Ashmem, the kernel modules required to use the android emulator Anbox. ! This doesn't apply to 5.4.y !
_anbox=""

# A selection of patches from Zen/Liquorix kernel and additional tweaks for a better gaming experience (ZENIFY) - Default is "true"
_zenify="false"

# compiler optimization level - 1. Optimize for performance (-O2); 2. Optimize harder (-O3); 3. Optimize for size (-Os) - Kernel default is "1"
_compileroptlevel="1"

# CPU compiler optimizations - Defaults to prompt at kernel config if left empty
# AMD CPUs : "k8" "k8sse3" "k10" "barcelona" "bobcat" "jaguar" "bulldozer" "piledriver" "steamroller" "excavator" "zen" "zen2" "zen3" "zen4" (zen3 opt support depends on GCC11) (zen4 opt support depends on GCC13)
# Intel CPUs : "mpsc"(P4 & older Netburst based Xeon) "atom" "core2" "nehalem" "westmere" "silvermont" "sandybridge" "ivybridge" "haswell" "broadwell" "skylake" "skylakex" "cannonlake" "icelake" "goldmont" "goldmontplus" "cascadelake" "cooperlake" "tigerlake" "sapphirerapids" "rocketlake" "alderlake" "raptorlake" "meteorlake" (raptorlake and meteorlake opt support require GCC13)
# Other options :
# - "native_amd" (use compiler autodetection - Selecting your arch manually in the list above is recommended instead of this option)
# - "native_intel" (use compiler autodetection and will prompt for P6_NOPS - Selecting your arch manually in the list above is recommended instead of this option)
# - "generic" (kernel's default - to share the package between machines with different CPU µarch as long as they are x86-64)
#
# https://en.wikipedia.org/wiki/X86-64#Microarchitecture_Levels)
# - "generic_v2" (depends on GCC11 - to share the package between machines with different CPU µarch supporting at least x86-64-v2
# - "generic_v3" (depends on GCC11 - to share the package between machines with different CPU µarch supporting at least x86-64-v3
# - "generic_v4" (depends on GCC11 - to share the package between machines with different CPU µarch supporting at least x86-64-v4
_processor_opt="ivybridge"

# CacULE only - Enable Response Driven Balancer, an experimental load balancer for CacULE
_cacule_rdb="false"

# CacULE only - Load balance time period - Default is 19
# https://github.com/hamadmarri/cacule-cpu-scheduler/blob/master/patches/CacULE/RDB/rdb.patch#L56
_cacule_rdb_interval="19"

# TT only - Enable High HZ patch (available for 5.15 only) - Default is "false"
_tt_high_hz="false"

# MuQSS and PDS only - SMT (Hyperthreading) aware nice priority and policy support (SMT_NICE) - Kernel default is "true" - You can disable this on non-SMT/HT CPUs for lower overhead
_smt_nice=""

# Trust the CPU manufacturer to initialize Linux's CRNG (RANDOM_TRUST_CPU) - Kernel default is "false"
_random_trust_cpu="false"

# Timer frequency - "100" "250" "300" "500" "750" "1000" ("2000" is available for cacule cpusched only) - More options available in kernel config prompt when left empty depending on selected cpusched with the default option pointed with a ">" (2000 for cacule, 100 for muqss and 1000 for other cpu schedulers)
_timer_freq="100"

# Default CPU governor - "performance", "ondemand", "schedutil" or leave empty for default (schedutil)
_default_cpu_gov="performance"

# Use an aggressive ondemand governor instead of default ondemand to improve performance on low loads/high core count CPUs while keeping some power efficiency from frequency scaling.
# It still requires you to either set ondemand as default governor or to select it in some way at runtime.
_aggressive_ondemand="true"

# [Advanced] Default TCP IPv4 algorithm to use. Options are: "yeah", "bbr", "cubic", "reno", "vegas" and "westwood". Leave empty if unsure.
# This config option will not be prompted
# Can be changed at runtime with the command line `# echo "$name" > /proc/sys/net/ipv4/tcp_congestion_control` where $name is one of the options above.
# Default (empty) and fallback : cubic
_tcp_cong_alg=""

# You can pass a default set of kernel command line options here - example: "intel_pstate=passive nowatchdog amdgpu.ppfeaturemask=0xfffd7fff mitigations=off"
_custom_commandline="intel_pstate=passive"

# Selection of Clearlinux patches
_clear_patches="true"

#### SPESHUL OPTION ####

# If you want to bypass the stock naming scheme and enforce something else (example : "linux") - Useful for some bootloaders requiring manual entry editing on each release.
# !!! It will also change pkgname - If you don't explicitely need this, don't use it !!!
_custom_pkgbase=""

# [non-Arch specific] Kernel localversion. Putting it to "Mario" will make for example the kernel version be 5.7.0-tkg-Mario (given by uname -r)
# If left empty, it will use "-tkg-${_cpusched}${_compiler}" where "${_cpusched}" will be replaced by the user chosen scheduler, ${_compiler} will be replaced by "-llvm" if clang is used (nothing for GCC).
_kernel_localversion=""

# Set to your maximum number of CPUs (physical + logical cores) - Lower means less overhead - You can set it to "$(nproc)" to use the current host's CPU(s) core count, or leave empty to use default
# If you set this to a lower value than you have cores, some cores will be disabled
# Default Arch kernel value is 320
_NR_CPUS_value=""

#### LEGACY OPTIONS ####

# Fsync legacy, known as FUTEX_WAIT_MULTIPLE (opcode 31) - previous version of fsync required for Valve Proton 4.11, 5.0 and 5.13 - https://steamcommunity.com/games/221410/announcements/detail/2957094910196249305
_fsync_legacy="false"

# Set to "true" to enable backported patches to add support for the futex_waitv() syscall, a new interface for fsync. Upstream as of 5.16 and requires a wine/proton with builtin support for it - https://github.com/ValveSoftware/wine/pull/128
# !! Disables futex2 interfaces support !!
# https://github.com/andrealmeid/futex_waitv_patches
_futex_waitv="true"

# Set to "true" to enable support for futex2, an experimental interface that can be used by proton-tkg and proton 5.13 experimental through Fsync - Can be enabled alongside fsync to use it as a fallback
# https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev
_futex2="true"

# Set to "true" to add back missing symbol for AES-NI/AVX support on ZFS - This is a legacy option that can be ignored on 5.10+ kernels - https://github.com/NixOS/nixpkgs/blob/master/pkgs/os-specific/linux/kernel/export_kernel_fpu_functions.patch
_zfsfix="true"

# MuQSS only - CPU scheduler runqueue sharing - No sharing (RQ_NONE), SMT (hyperthread) siblings (RQ_SMT), Multicore siblings (RQ_MC), Symmetric Multi-Processing (RQ_SMP), NUMA (RQ_ALL)
# Valid values are "none", "smt", "mc", "mc-llc"(for zen), "smp", "all" - Kernel default is "smt"
_runqueue_sharing=""

# MuQSS only - Make IRQ threading compulsory (FORCE_IRQ_THREADING) - Default is "false"
_irq_threading="false"

# Set to "true" to add multi-generational LRU framework support on kernel 5.18+ - Improves memory pressure handling - https://lore.kernel.org/lkml/20220706220022.968789-1-yuzhao@google.com/
# Older kernel versions might have a patch available in the community-patches repo
# Upstream as of 6.1
# ! This option will be disabled when bcachefs is enabled !
_mglru="true"

#### USER PATCHES ####

# community patches - add patches (separated by a space) of your choice by name from the community-patches dir
# example: _community_patches="clear_nack_in_tend_isr.myrevert ffb_regression_fix.mypatch 0008-drm-amd-powerplay-force-the-trim-of-the-mclk-dpm-levels-if-OD-is-enabled.mypatch"
_community_patches=""

# You can use your own patches by putting them in a subfolder called linux<version>-tkg-userpatches (e.g. linux510-tkg-userpatches) next to the PKGBUILD and giving them the .mypatch extension.
# You can also revert patches by putting them in that same folder and giving them the .myrevert extension.

# Also, userpatches variable below must be set to true for the above to work.
_user_patches="true"

# Apply all user patches without confirmation - !!! NOT RECOMMENDED !!!
_user_patches_no_confirm="false"

#### CONFIG FRAGMENTS ####

# You can use your own kernel config fragments by putting them in the same folder as the PKGBUILD and giving them the .myfrag extension.

# Also, the config fragments variable below must be set to true for the above to work.
_config_fragments="true"

# Apply all config fragments without confirmation - !!! NOT RECOMMENDED !!!
_config_fragments_no_confirm="false"

In menuconfig i change only dynamic preempt to none, all other cfg is arch default

Tk-Glitch commented 1 year ago

On which GPU?

Pheidologeton commented 1 year ago

RX6900XT

Tk-Glitch commented 1 year ago

Reverted with https://github.com/Frogging-Family/linux-tkg/commit/5c02fb44c4ce400c8d2bfcd3c7b58135f36a8de8

Pheidologeton commented 1 year ago

If it is important, I can describe the system configuration (hardware)

Muhownage commented 1 year ago

Hi, im not TKG but can you try the following (with a kernel which has the patch applied):

shutdown the pc, disconnect it from power for ~10 seconds and power it up again.

i had the problem you described before, unrelated to the patch. for me it usually was when i dual booted from windows to linux several times without completly power cycling my gpu (also a 6900xt). it seems like some former initialized states of the gpu remain as long as the card is powered which causes some kind of corruption: 640x480, only one monitor working, etc. but doing the power cycling as described always fixed it for me.

Pheidologeton commented 1 year ago

I pressed the "reset" button on the case, which completely turns off the power and starts it after a few seconds. Same problem. I don't have windows. CPU dual xeon e5-2696v2, RAM 16x8gb ddr3 ecc reg 1866 mhz samsung, motherboard supermicro x9dri-ln4f+, GPU rx 6900xt xfc merc.

CakZemprongzz commented 1 year ago

Are you sure this is caused the problem? What kind of dmesg you getting?

I use 6700XT and compiling linux-tkg 9 hours after the commit and everything seems working fine.. I am not sure that commit is the issue..

I am assuming 6700XT and 6900XT is the same considering both share same architecture..

Astrobald commented 1 year ago

I still use this patched version of linux-tkg without having noticed any regression on my system. I have a 5500xt.

Tk-Glitch commented 1 year ago

I'm also using it just fine with my 5700XT, but we'd need to be certain the issue isn't actually caused by the patch. Such a weird hardware combo could trigger an edge case of some sort I guess.

Pheidologeton commented 1 year ago

Even more interesting. If CONFIG_HZ_100=y this patch breaks the amdgpu. If CONFIG_HZ_250=y, all works fine