golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.96k stars 17.53k forks source link

x/exp/shiny/driver/internal: swizzle still needs to detect instruction on amd64 #12714

Closed kardianos closed 8 years ago

kardianos commented 9 years ago

CPU: AMD Phenom II X6 (Thuban PH-E0) 1055T $cpuid -1 |grep SSE4 SSE4.1 extensions = false SSE4.2 extensions = false SSE4A support = true

SIGILL: illegal instruction PC=0x552cd5 m=6

goroutine 1 [running]: golang.org/x/exp/shiny/driver/internal/swizzle.bgra16(0x7f982d522000, 0x40000, 0x40000) /home/daniel/src/golang.org/x/exp/shiny/driver/internal/swizzle/swizzle_amd64.s:36 +0x45 fp=0xc8200517f0 sp=0xc8200517e8 golang.org/x/exp/shiny/driver/internal/swizzle.BGRA(0x7f982d522000, 0x40000, 0x40000) /home/daniel/src/golang.org/x/exp/shiny/driver/internal/swizzle/swizzle_common.go:21 +0xcf fp=0xc820051850 sp=0xc8200517f0 golang.org/x/exp/shiny/driver/x11driver.(*bufferImpl).preUpload(0xc820106000) /home/daniel/src/golang.org/x/exp/shiny/driver/x11driver/buffer.go:54 +0x11d fp=0xc820051898 sp=0xc820051850 ... rax 0xf0c0d0e0b08090a rbx 0x0 rcx 0xc820000180 rdx 0x40000 rdi 0x7f982d562000 rsi 0x7f982d522000 rbp 0x40000 rsp 0xc8200517e8 r8 0x7f982d522000 r9 0x676050 r10 0x2 r11 0x246 r12 0x5 r13 0x6c24cc r14 0x8 r15 0x0 rip 0x552cd5 rflags 0x10287 cs 0x33 fs 0x0 gs 0x0 exit status 2

rakyll commented 8 years ago

/cc @nigeltao

nigeltao commented 8 years ago

What does "cpuid -1" without the grep say? I think that PSHUFB was introduced in SSSE3.

kardianos commented 8 years ago
  SSE extensions                         = true
  SSE2 extensions                        = true
  PNI/SSE3: Prescott New Instructions     = true
  SSSE3 extensions                        = false
  SSE4.1 extensions                       = false
  SSE4.2 extensions                       = false
  SSE extensions                        = true
  SSE4A support                          = true
  misaligned SSE mode                    = true
  SSSE3/SSE5 opcode set disable = false
  128-bit SSE executed full-width = true

...

CPU: vendor_id = "AuthenticAMD" version information (1/eax): processor type = primary processor (0) family = Intel Pentium 4/Pentium D/Pentium Extreme Edition/Celeron/Xeon/Xeon MP/Itanium2, AMD Athlon 64/Athlon XP-M/Opteron/Sempron/Turion (15) model = 0xa (10) stepping id = 0x0 (0) extended family = 0x1 (1) extended model = 0x0 (0) (simple synth) = AMD Phenom II X4 / X6 (Zosma / Thuban PH-E0), 45nm miscellaneous (1/ebx): process local APIC physical ID = 0x0 (0) cpu count = 0x6 (6) CLFLUSH line size = 0x8 (8) brand index = 0x0 (0) brand id = 0x00 (0): unknown feature information (1/edx): x87 FPU on chip = true virtual-8086 mode enhancement = true debugging extensions = true page size extensions = true time stamp counter = true RDMSR and WRMSR support = true physical address extensions = true machine check exception = true CMPXCHG8B inst. = true APIC on chip = true SYSENTER and SYSEXIT = true memory type range registers = true PTE global bit = true machine check architecture = true conditional move/compare instruction = true page attribute table = true page size extension = true processor serial number = false CLFLUSH instruction = true debug store = false thermal monitor and clock ctrl = false MMX Technology = true FXSAVE/FXRSTOR = true SSE extensions = true SSE2 extensions = true self snoop = false hyper-threading / multi-core supported = true therm. monitor = false IA64 = false pending break event = false feature information (1/ecx): PNI/SSE3: Prescott New Instructions = true PCLMULDQ instruction = false 64-bit debug store = false MONITOR/MWAIT = true CPL-qualified debug store = false VMX: virtual machine extensions = false SMX: safer mode extensions = false Enhanced Intel SpeedStep Technology = false thermal monitor 2 = false SSSE3 extensions = false context ID: adaptive or shared L1 data = false FMA instruction = false CMPXCHG16B instruction = true xTPR disable = false perfmon and debug = false process context identifiers = false direct cache access = false SSE4.1 extensions = false SSE4.2 extensions = false extended xAPIC support = false MOVBE instruction = false POPCNT instruction = true time stamp counter deadline = false AES instruction = false XSAVE/XSTOR states = false OS-enabled XSAVE/XSTOR = false AVX: advanced vector extensions = false F16C half-precision convert instruction = false RDRAND instruction = false hypervisor guest status = false cache and TLB information (2): processor serial number: 0010-0FA0-0000-0000-0000-0000 MONITOR/MWAIT (5): smallest monitor-line size (bytes) = 0x40 (64) largest monitor-line size (bytes) = 0x40 (64) enum of Monitor-MWAIT exts supported = true supports intrs as break-event for MWAIT = true number of C0 sub C-states using MWAIT = 0x0 (0) number of C1 sub C-states using MWAIT = 0x0 (0) number of C2 sub C-states using MWAIT = 0x0 (0) number of C3 sub C-states using MWAIT = 0x0 (0) number of C4 sub C-states using MWAIT = 0x0 (0) number of C5 sub C-states using MWAIT = 0x0 (0) number of C6 sub C-states using MWAIT = 0x0 (0) number of C7 sub C-states using MWAIT = 0x0 (0) Thermal and Power Management Features (6): digital thermometer = false Intel Turbo Boost Technology = false ARAT always running APIC timer = false PLN power limit notification = false ECMD extended clock modulation duty = false PTM package thermal management = false digital thermometer thresholds = 0x0 (0) ACNT/MCNT supported performance measure = true ACNT2 available = false performance-energy bias capability = false extended processor signature (0x80000001/eax): family/generation = AMD Athlon 64/Opteron/Sempron/Turion (15) model = 0xa (10) stepping id = 0x0 (0) extended family = 0x1 (1) extended model = 0x0 (0) (simple synth) = AMD Phenom II X4 / X6 (Zosma / Thuban PH-E0), 45nm extended feature flags (0x80000001/edx): x87 FPU on chip = true virtual-8086 mode enhancement = true debugging extensions = true page size extensions = true time stamp counter = true RDMSR and WRMSR support = true physical address extensions = true machine check exception = true CMPXCHG8B inst. = true APIC on chip = true SYSCALL and SYSRET instructions = true memory type range registers = true global paging extension = true machine check architecture = true conditional move/compare instruction = true page attribute table = true page size extension = true multiprocessing capable = false no-execute page protection = true AMD multimedia instruction extensions = true MMX Technology = true FXSAVE/FXRSTOR = true SSE extensions = true 1-GB large page support = true RDTSCP = true long mode (AA-64) = true 3DNow! instruction extensions = true 3DNow! instructions = true extended brand id (0x80000001/ebx): raw = 0x10000050 (268435536) BrandId = 0x50 (80) str1 = 0x0 (0) str2 = 0x0 (0) PartialModel = 0x5 (5) PG = 0x0 (0) PkgType = 0x1 (1) AMD feature flags (0x80000001/ecx): LAHF/SAHF supported in 64-bit mode = true CMP Legacy = true SVM: secure virtual machine = true extended APIC space = true AltMovCr8 = true LZCNT advanced bit manipulation = true SSE4A support = true misaligned SSE mode = true 3DNow! PREFETCH/PREFETCHW instructions = true OS visible workaround = true instruction based sampling = true XOP support = false SKINIT/STGI support = true watchdog timer support = true lightweight profiling support = false 4-operand FMA instruction = false NodeId MSR C001100C = false TBM support = false topology extensions = false brand = "AMD Phenom(tm) II X6 1055T Processor" L1 TLB/cache information: 2M/4M pages & L1 TLB (0x80000005/eax): instruction # entries = 0x10 (16) instruction associativity = 0xff (255) data # entries = 0x30 (48) data associativity = 0xff (255) L1 TLB/cache information: 4K pages & L1 TLB (0x80000005/ebx): instruction # entries = 0x20 (32) instruction associativity = 0xff (255) data # entries = 0x30 (48) data associativity = 0xff (255) L1 data cache information (0x80000005/ecx): line size (bytes) = 0x40 (64) lines per tag = 0x1 (1) associativity = 0x2 (2) size (Kb) = 0x40 (64) L1 instruction cache information (0x80000005/edx): line size (bytes) = 0x40 (64) lines per tag = 0x1 (1) associativity = 0x2 (2) size (Kb) = 0x40 (64) L2 TLB/cache information: 2M/4M pages & L2 TLB (0x80000006/eax): instruction # entries = 0x0 (0) instruction associativity = L2 off (0) data # entries = 0x80 (128) data associativity = 2-way (2) L2 TLB/cache information: 4K pages & L2 TLB (0x80000006/ebx): instruction # entries = 0x200 (512) instruction associativity = 4-way (4) data # entries = 0x200 (512) data associativity = 4-way (4) L2 unified cache information (0x80000006/ecx): line size (bytes) = 0x40 (64) lines per tag = 0x1 (1) associativity = 16-way (8) size (Kb) = 0x200 (512) L3 cache information (0x80000006/edx): line size (bytes) = 0x40 (64) lines per tag = 0x1 (1) associativity = 48-way (11) size (in 512Kb units) = 0xc (12) Advanced Power Management Features (0x80000007/edx): temperature sensing diode = true frequency ID (FID) control = false voltage ID (VID) control = false thermal trip (TTP) = true thermal monitor (TM) = true software thermal control (STC) = true 100 MHz multiplier control = true hardware P-State control = true TscInvariant = true Physical Address and Linear Address Size (0x80000008/eax): maximum physical address bits = 0x30 (48) maximum linear (virtual) address bits = 0x30 (48) maximum guest physical address bits = 0x0 (0) Logical CPU cores (0x80000008/ecx): number of CPU cores - 1 = 0x5 (5) ApicIdCoreIdSize = 0x3 (3) SVM Secure Virtual Machine (0x8000000a/eax): SvmRev: SVM revision = 0x1 (1) SVM Secure Virtual Machine (0x8000000a/edx): nested paging = true LBR virtualization = true SVM lock = true NRIP save = true MSR based TSC rate control = false VMCB clean bits support = false flush by ASID = false decode assists = false SSSE3/SSE5 opcode set disable = false pause intercept filter = true pause filter threshold = false NASID: number of address space identifiers = 0x40 (64): L1 TLB information: 1G pages (0x80000019/eax): instruction # entries = 0x0 (0) instruction associativity = L2 off (0) data # entries = 0x30 (48) data associativity = full (15) L2 TLB information: 1G pages (0x80000019/ebx): instruction # entries = 0x0 (0) instruction associativity = L2 off (0) data # entries = 0x10 (16) data associativity = 8-way (6) SVM Secure Virtual Machine (0x8000001a/eax): 128-bit SSE executed full-width = true MOVU* better than MOVL/MOVH = true Instruction Based Sampling Identifiers (0x8000001b/eax): IBS feature flags valid = true IBS fetch sampling = true IBS execution sampling = true read write of op counter = true op counting mode = true branch target address reporting = false IbsOpCurCnt and IbsOpMaxCnt extend 7 = false invalid RIP indication supported = false (instruction supported synth): CMPXCHG8B = true conditional move/compare = true PREFETCH/PREFETCHW = true (multi-processing synth): multi-core (c=6) (multi-processing method): AMD (APIC widths synth): CORE_width=3 SMT_width=0 (APIC synth): PKG_ID=0 CORE_ID=0 SMT_ID=0 (synth) = AMD Phenom II X6 (Thuban PH-E0), 45nm 1055T Processor

nigeltao commented 8 years ago

I believe that this is fixed, but I obviously didn't try it on your CPU. Please re-open if you're still seeing problems.

kardianos commented 8 years ago

I tried it. That did fix it on my CPU. Thanks!

On Wed, Sep 23, 2015 at 9:06 PM Nigel Tao notifications@github.com wrote:

I believe that this is fixed, but I obviously didn't try it on your CPU. Please re-open if you're still seeing problems.

— Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/12714#issuecomment-142804177.