dexcz / libyuv

Automatically exported from code.google.com/p/libyuv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

NaCL AVX2 support #380

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
AVX2 assembly is currently disabled for NaCL; Enable it.

1. Document how to build with NaCL
2. Enable AVX2 assembly for NaCL.  Ensure compiler produces binaries.
3. Ensure validator passes.

Original issue reported on code.google.com by fbarch...@google.com on 12 Nov 2014 at 12:02

GoogleCodeExporter commented 9 years ago
The attacks source/makefile build and validate with nacl.  Its assumed you have 
the tools.

Original comment by fbarch...@google.com on 12 Nov 2014 at 10:37

Attachments:

GoogleCodeExporter commented 9 years ago
In r1180 there are 68 occurances of 'unrecognized instruction'.
64 bit now builds with same list of errors.

naclsdk update pepper_canary --force
Updating bundle pepper_canary to version 41, revision 304415
didn't reduce the count.

Original comment by fbarch...@google.com on 25 Nov 2014 at 12:16

GoogleCodeExporter commented 9 years ago
Updated to pepper_canary to version 41, revision 308297
This is what ncval currently shows on x86 32 bit:
d:\src\libyuv\trunk>d:\src\nacl_sdk\pepper_canary\tools\ncval.exe 
newlib/Release/nacltest_x86_32.nexe
   2b073: unrecognized instruction
   2b080: unrecognized instruction
   2b0a0: unrecognized instruction
   2b133: unrecognized instruction
   2b140: unrecognized instruction
   2b160: unrecognized instruction
   2b2f3: unrecognized instruction
   2b300: unrecognized instruction
   2b32a: unrecognized instruction
   2b340: unrecognized instruction
   2b360: unrecognized instruction
   2c60c: unrecognized instruction
   2c620: unrecognized instruction
   2c640: unrecognized instruction
   2c663: unrecognized instruction
   2c680: unrecognized instruction
   2c6a0: unrecognized instruction
   2c6c0: unrecognized instruction
   2c6e0: unrecognized instruction
   2c705: unrecognized instruction
   2c720: unrecognized instruction
   2c740: unrecognized instruction
   2c7aa: unrecognized instruction
   2c7c0: unrecognized instruction
   2c7e0: unrecognized instruction
   2c807: unrecognized instruction
   2c820: unrecognized instruction
   2c840: unrecognized instruction
   2c860: unrecognized instruction
   2c8ca: unrecognized instruction
   2c8e0: unrecognized instruction
   2c900: unrecognized instruction
   2c927: unrecognized instruction
   2c940: unrecognized instruction
   2c960: unrecognized instruction
   2c980: unrecognized instruction
   2c9ea: unrecognized instruction
   2ca00: unrecognized instruction
   2ca20: unrecognized instruction
   2ca47: unrecognized instruction
   2ca60: unrecognized instruction
   2ca80: unrecognized instruction
   2caa0: unrecognized instruction
   2cbe6: unrecognized instruction
   2cd40: unrecognized instruction
   2cd71: unrecognized instruction
   2cd8c: unrecognized instruction
   2cda0: unrecognized instruction
   2ce8c: unrecognized instruction
   2d0cc: unrecognized instruction
   2d0ec: unrecognized instruction
   2d1ac: unrecognized instruction
   2d1c0: unrecognized instruction
   2d1e0: unrecognized instruction
   2d5ec: unrecognized instruction
   2d60c: unrecognized instruction
   2d660: unrecognized instruction
   2d689: unrecognized instruction
   2d6a0: unrecognized instruction
   2d6c0: unrecognized instruction
   2d711: unrecognized instruction
   2d72c: unrecognized instruction
   2d740: unrecognized instruction
   2d7ac: unrecognized instruction
   2d800: unrecognized instruction
   2d829: unrecognized instruction
   2d840: unrecognized instruction
   2d860: unrecognized instruction
   2d8b1: unrecognized instruction
   2d8cc: unrecognized instruction
   2d8e0: unrecognized instruction
   2ddb5: unrecognized instruction
   2ddc4: unrecognized instruction
   2dde0: unrecognized instruction
   2df04: unrecognized instruction
   2df24: unrecognized instruction
   2df48: unrecognized instruction
   2df64: unrecognized instruction
   2df80: unrecognized instruction
   2e411: unrecognized instruction
   2e42e: unrecognized instruction
   2e440: unrecognized instruction
   2e507: unrecognized instruction
   2e5c7: unrecognized instruction
   2ef62: unrecognized instruction
   2ef80: unrecognized instruction
   2efa9: unrecognized instruction
   2efc0: unrecognized instruction
   2efe9: unrecognized instruction
   2f024: unrecognized instruction
   2f049: unrecognized instruction
   2f3ac: unrecognized instruction
   2f7a0: unrecognized instruction
   2f7c0: unrecognized instruction
Invalid.

Original comment by fbarch...@google.com on 16 Dec 2014 at 6:21

GoogleCodeExporter commented 9 years ago
Updating bundle pepper_canary to version 41, revision 309416
91 fails... down from 94.

Original comment by fbarch...@google.com on 23 Dec 2014 at 9:52

GoogleCodeExporter commented 9 years ago
Updating bundle pepper_canary to version 41, revision 309689
still at 91 fails.

Original comment by fbarch...@google.com on 29 Dec 2014 at 8:24

GoogleCodeExporter commented 9 years ago
Updating bundle pepper_canary to version 42, revision 311009
69 fails for 32 bit.

Original comment by fbarch...@google.com on 12 Jan 2015 at 10:21

GoogleCodeExporter commented 9 years ago
d:\src\libyuv\trunk>d:\src\nacl_sdk\pepper_canary\tools\ncval.exe 
newlib/Release/nacltest_x86_32.nexe
   2f6b8: unrecognized instruction
Invalid.

d:\src\libyuv\trunk>d:\src\nacl_sdk\pepper_canary\tools\ncval.exe 
newlib/Release/nacltest_x86_64.nexe
   2c240: improper memory address - bad index
   2c246: improper memory address - bad index
   2c24d: improper memory address - bad index
   2c254: improper memory address - bad index
   2c297: improper memory address - bad index
   2c2e0: improper memory address - bad index
   2c2e6: improper memory address - bad index
   2c2ed: improper memory address - bad index
   2c2f4: improper memory address - bad index
   2c340: improper memory address - bad index
   2c4c0: improper memory address - bad index
   2c4c6: improper memory address - bad index
   2c4cd: improper memory address - bad index
   2c4d4: improper memory address - bad index
   2c573: improper memory address - bad index
   2da20: improper memory address - bad index
   2da44: improper memory address - bad index
   2da4b: improper memory address - bad index
   2da52: improper memory address - bad index
   2da60: improper memory address - bad index
   2da6a: improper memory address - bad index
   2da74: improper memory address - bad index
   2da80: improper memory address - bad index
   2da93: improper memory address - bad index
   2dae4: improper memory address - bad index
   2daea: improper memory address - bad index
   2db40: improper memory address - bad index
   2db64: improper memory address - bad index
   2db6b: improper memory address - bad index
   2db72: improper memory address - bad index
   2db80: improper memory address - bad index
   2db8a: improper memory address - bad index
   2db94: improper memory address - bad index
   2dba0: improper memory address - bad index
   2dbb3: improper memory address - bad index
   2dc04: improper memory address - bad index
   2dc0a: improper memory address - bad index
   2dc60: improper memory address - bad index
   2dc84: improper memory address - bad index
   2dc8b: improper memory address - bad index
   2dc92: improper memory address - bad index
   2dca0: improper memory address - bad index
   2dcaa: improper memory address - bad index
   2dcb4: improper memory address - bad index
   2dcc0: improper memory address - bad index
   2dcd3: improper memory address - bad index
   2dd24: improper memory address - bad index
   2dd2a: improper memory address - bad index
   2dd80: improper memory address - bad index
   2dda4: improper memory address - bad index
   2ddab: improper memory address - bad index
   2ddb2: improper memory address - bad index
   2ddc0: improper memory address - bad index
   2ddca: improper memory address - bad index
   2ddd4: improper memory address - bad index
   2dde0: improper memory address - bad index
   2ddf3: improper memory address - bad index
   2de44: improper memory address - bad index
   2de4a: improper memory address - bad index
   2dfb6: improper memory address - bad index
   2e14b: improper memory address - bad index
   2e1a0: improper memory address - bad index
   2e1a6: improper memory address - bad index
   2e1d8: improper memory address - bad index
   2e2c0: improper memory address - bad index
   2e2e0: improper memory address - bad index
   2e2e7: improper memory address - bad index
   2e2ef: improper memory address - bad index
   2e2f7: improper memory address - bad index
   2e3e0: improper memory address - bad index
   2e3e6: improper memory address - bad index
   2e3f0: improper memory address - bad index
   2e3f6: improper memory address - bad index
   2e520: improper memory address - bad index
   2e526: improper memory address - bad index
   2e530: improper memory address - bad index
   2e537: improper memory address - bad index
   2e540: improper memory address - bad index
   2e546: improper memory address - bad index
   2e640: improper memory address - bad index
   2e646: improper memory address - bad index
   2e660: improper memory address - bad index
   2e667: improper memory address - bad index
   2e66f: improper memory address - bad index
   2e675: improper memory address - bad index
   2ea80: improper memory address - bad index
   2ea86: improper memory address - bad index
   2eaa6: improper memory address - bad index
   2eb00: improper memory address - bad index
   2eb06: improper memory address - bad index
   2eb66: improper memory address - bad index
   2ebc0: improper memory address - bad index
   2ebc6: improper memory address - bad index
   2ec06: improper memory address - bad index
   2ec40: improper memory address - bad index
   2ec46: improper memory address - bad index
   2ec66: improper memory address - bad index
   2ecc0: improper memory address - bad index
   2ecc6: improper memory address - bad index
   2ed20: improper memory address - bad index
   2ed80: improper memory address - bad index
   2ed86: improper memory address - bad index
   2edc6: improper memory address - bad index
   2f300: improper memory address - bad index
   2f520: improper memory address - bad index
   2fa80: improper memory address - bad index
   2fa89: improper memory address - bad index
   2fab0: improper memory address - bad index
   2fb20: improper memory address - bad index
   2fb29: improper memory address - bad index
   2fb32: improper memory address - bad index
   2fba0: improper memory address - bad index
   2fba9: improper memory address - bad index
   2fbb2: improper memory address - bad index
   30720: improper memory address - bad index
   30780: improper memory address - bad index
   307c0: improper memory address - bad index
   30800: improper memory address - bad index
   30bc0: improper memory address - bad index
   30be0: improper memory address - bad index
   30be6: improper memory address - bad index
   30bfa: improper memory address - bad index
   30c00: improper memory address - bad index
   31080: improper memory address - bad index
   31086: improper memory address - bad index
   3108d: improper memory address - bad index
   31094: improper memory address - bad index
   310a0: improper memory address - bad index
   310d8: unrecognized instruction
Invalid.

Original comment by fbarch...@google.com on 6 Feb 2015 at 8:58

GoogleCodeExporter commented 9 years ago
This is the missing instruction:
2f6b8:  c5 f9 d6 02             vmovq  %xmm0,(%edx)

Original comment by fbarch...@google.com on 6 Feb 2015 at 9:01

GoogleCodeExporter commented 9 years ago
32 bit code lacks 1 instruction: vmovq

Original comment by fbarch...@google.com on 9 Feb 2015 at 8:55

GoogleCodeExporter commented 9 years ago
// Convert 32 ARGB pixels (128 bytes) to 32 Y values.
void ARGBToYRow_AVX2(const uint8* src_argb, uint8* dst_y, int pix) {
  asm volatile (
    "vbroadcastf128 %3,%%ymm4                  \n"
    "vbroadcastf128 %4,%%ymm5                  \n"
    "vmovdqu    %5,%%ymm6                      \n"
    LABELALIGN
  "1:                                          \n"
    "vmovdqu    " MEMACCESS(0) ",%%ymm0        \n"
    "vmovdqu    " MEMACCESS2(0x20,0) ",%%ymm1  \n"
    "vmovdqu    " MEMACCESS2(0x40,0) ",%%ymm2  \n"
    "vmovdqu    " MEMACCESS2(0x60,0) ",%%ymm3  \n"

000000000002c220 <ARGBToYRow_AVX2>:
   2c220:   c4 e2 7d 1a 25 77 47    vbroadcastf128 0x10004777(%rip),%ymm4        # 100309a0 <_ZN6libyuvL8kARGBToYE>
   2c227:   00 10 
   2c229:   c4 e2 7d 1a 2d ce 46    vbroadcastf128 0x100046ce(%rip),%ymm5        # 10030900 <_ZN6libyuvL7kAddY16E>
   2c230:   00 10 
   2c232:   c5 fe 6f 35 46 41 ff    vmovdqu 0xfff4146(%rip),%ymm6        # 10020380 <_ZN6libyuvL17kPermdARGBToY_AVXE>
   2c239:   0f 
   2c23a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

   2c240:   c4 c1 7e 6f 04 3f       vmovdqu (%r15,%rdi,1),%ymm0

   2c246:   c4 c1 7e 6f 4c 3f 20    vmovdqu 0x20(%r15,%rdi,1),%ymm1
   2c24d:   c4 c1 7e 6f 54 3f 40    vmovdqu 0x40(%r15,%rdi,1),%ymm2
   2c254:   c4 c1 7e 6f 5c 3f 60    vmovdqu 0x60(%r15,%rdi,1),%ymm3

Original comment by fbarch...@google.com on 12 Feb 2015 at 12:09

GoogleCodeExporter commented 9 years ago
In row.h 64 bit nacl version is:
#define MEMACCESS(base) "%%nacl:(%%r15,%q" #base ")"
#define MEMACCESS2(offset, base) "%%nacl:" #offset "(%%r15,%q" #base ")"
or this for 32 bit and non-nacl:
#define MEMACCESS(base) "(%" #base ")"
#define MEMACCESS2(offset, base) #offset "(%" #base ")"

The MEMACCESS macros are used for both SSE and AVX but dont produce the cleanse 
on AVX.

Original comment by fbarch...@google.com on 13 Feb 2015 at 9:34

GoogleCodeExporter commented 9 years ago
32 bit passes

d:\src\libyuv\trunk>d:\src\nacl_sdk\pepper_canary\tools\ncval.exe 
newlib/Release/nacltest_x86_32.nex

d:\src\libyuv\trunk>d:\src\nacl_sdk\pepper_canary\tools\ncval.exe 
newlib/Release/nacltest_x86_64.nex
   2c240: improper memory address - bad index
   2c246: improper memory address - bad index
   2c24d: improper memory address - bad index
   2c254: improper memory address - bad index
   2c297: improper memory address - bad index
   2c2e0: improper memory address - bad index
   2c2e6: improper memory address - bad index
   2c2ed: improper memory address - bad index
   2c2f4: improper memory address - bad index
   2c340: improper memory address - bad index
   2c4c0: improper memory address - bad index
   2c4c6: improper memory address - bad index
   2c4cd: improper memory address - bad index
   2c4d4: improper memory address - bad index
   2c573: improper memory address - bad index
   2dc00: improper memory address - bad index
   2dc40: improper memory address - bad index
   2dc46: improper memory address - bad index
   2dd16: improper memory address - bad index
   2deab: improper memory address - bad index
   2df00: improper memory address - bad index
   2df06: improper memory address - bad index
   2df38: improper memory address - bad index
   2e020: improper memory address - bad index
   2e040: improper memory address - bad index
   2e047: improper memory address - bad index
   2e04f: improper memory address - bad index
   2e057: improper memory address - bad index
   2e140: improper memory address - bad index
   2e146: improper memory address - bad index
   2e150: improper memory address - bad index
   2e156: improper memory address - bad index
   2e280: improper memory address - bad index
   2e286: improper memory address - bad index
   2e290: improper memory address - bad index
   2e297: improper memory address - bad index
   2e2a0: improper memory address - bad index
   2e2a6: improper memory address - bad index
   2e3a0: improper memory address - bad index
   2e3a6: improper memory address - bad index
   2e3c0: improper memory address - bad index
   2e3c7: improper memory address - bad index
   2e3cf: improper memory address - bad index
   2e3d5: improper memory address - bad index
   2e7e0: improper memory address - bad index
   2e7e6: improper memory address - bad index
   2e806: improper memory address - bad index
   2e860: improper memory address - bad index
   2e866: improper memory address - bad index
   2e8c6: improper memory address - bad index
   2e920: improper memory address - bad index
   2e926: improper memory address - bad index
   2e966: improper memory address - bad index
   2e9a0: improper memory address - bad index
   2e9a6: improper memory address - bad index
   2e9c6: improper memory address - bad index
   2ea20: improper memory address - bad index
   2ea26: improper memory address - bad index
   2ea80: improper memory address - bad index
   2eae0: improper memory address - bad index
   2eae6: improper memory address - bad index
   2eb26: improper memory address - bad index
   2f060: improper memory address - bad index
   2f280: improper memory address - bad index
   2f7e0: improper memory address - bad index
   2f7e9: improper memory address - bad index
   2f810: improper memory address - bad index
   2f880: improper memory address - bad index
   2f889: improper memory address - bad index
   2f892: improper memory address - bad index
   2f900: improper memory address - bad index
   2f909: improper memory address - bad index
   2f912: improper memory address - bad index
   30480: improper memory address - bad index
   304e0: improper memory address - bad index
   30520: improper memory address - bad index
   30560: improper memory address - bad index
   308c0: improper memory address - bad index
   308e0: improper memory address - bad index
   308e6: improper memory address - bad index
   308fa: improper memory address - bad index
   30900: improper memory address - bad index
   30d80: improper memory address - bad index
   30d86: improper memory address - bad index
   30d8d: improper memory address - bad index
   30d94: improper memory address - bad index
   30da0: improper memory address - bad index
   30dd8: improper memory address - bad index
Invalid.

Original comment by fbarch...@google.com on 17 Feb 2015 at 6:42

GoogleCodeExporter commented 9 years ago
I've filed this nacl issue to track adding this to the assembler:
https://code.google.com/p/nativeclient/issues/detail?id=4116

At a glance the plumbing that addis it just doesn't understand how to decode 
the operands to AVX.

It might take a bit of time to add. I think if you are blocked we should seek a 
workaround.

How difficult in the meantime would it be to modify your macros to do the 
sandboxing directly?
You can use .bundle_lock + .bundle_unlock to avoid crossing a bundle boundary, 
and add an explicit movl with the destination index to itself.
This would require your macro to encompass the entire instruction instead of 
just the destination / source though.

Original comment by bradnel...@chromium.org on 3 Mar 2015 at 1:55

GoogleCodeExporter commented 9 years ago
The macros look like this:
#define MEMACCESS(base) "%%nacl:(%%r15,%q" #base ")"
"movdqu    " MEMACCESS(1) ",%%xmm0         \n"

I dont think this form of macro can be changed to cleanse?
Its used extensively
d:\src\libyuv\trunk\source>findstr MEMACCESS * | wc -l
   1520    5531   70520
but only some some instructions don't work.

I think the task would involve replacing MEMACCESS with forms of MEMOP.
MEMOP is more difficult to port/maintain from Windows version of the code.  It 
requires variations for every variation of operands.

Original comment by fbarch...@google.com on 3 Mar 2015 at 7:29

GoogleCodeExporter commented 9 years ago
pepper_canary to version 43, revision 320912

NaCL arm broken atm:

d:\src\nacl_sdk\pepper_canary\tools\ncval.exe newlib/Release/nacltest_arm.nexe
   28f34: Load/store base r12 is not properly masked.
   28f38: Load/store base r4 is not properly masked.
   28f3c: Load/store base r5 is not properly masked.
   28f40: Load/store base r5 is not properly masked.
   28f44: Load/store base r5 is not properly masked.
   28f48: Load/store base r6 is not properly masked.
   29034: Load/store base r12 is not properly masked.
   29038: Load/store base r4 is not properly masked.
   2a020: Load/store base r4 is not properly masked.
Invalid.

32 bit x86 ok
d:\src\libyuv\trunk>d:\src\nacl_sdk\pepper_canary\tools\ncval.exe 
newlib/Release/nacltest_x86_32.nexe

64 bit x86 fails
d:\src\libyuv\trunk>d:\src\nacl_sdk\pepper_canary\tools\ncval.exe 
newlib/Release/nacltest_x86_64.nexe
   2c240: improper memory address - bad index
   2c246: improper memory address - bad index
   2c24d: improper memory address - bad index
   2c254: improper memory address - bad index
   30dd8: improper memory address - bad index
Invalid.

Original comment by fbarch...@chromium.org on 17 Mar 2015 at 11:36

GoogleCodeExporter commented 9 years ago
xgetbv not supported yet - needed to detect is avx2 is supported by OS

d:\src\libyuv\trunk>d:\src\nacl_sdk\pepper_canary\tools\ncval.exe 
newlib/Release/nacltest_x86_32.nexe
   43fe5: unrecognized instruction
   44122: unrecognized instruction
Invalid.

00043fe0 <TestOsSaveYmm>:
   43fe0:   55                      push   %ebp
   43fe1:   31 c9                   xor    %ecx,%ecx
   43fe3:   89 e5                   mov    %esp,%ebp
   43fe5:   0f 01 d0                xgetbv 
   43fe8:   5d                      pop    %ebp
   43fe9:   83 e0 06                and    $0x6,%eax
   43fec:   59                      pop    %ecx
   43fed:   83 f8 06                cmp    $0x6,%eax
   43ff0:   0f 94 c0                sete   %al
   43ff3:   0f b6 c0                movzbl %al,%eax
   43ff6:   83 e1 e0                and    $0xffffffe0,%ecx
   43ff9:   ff e1                   jmp    *%ecx

00044000 <InitCpuFlags>:
   44120:   31 c9                   xor    %ecx,%ecx
   44122:   0f 01 d0                xgetbv 
   44125:   83 e0 06                and    $0x6,%eax
   44128:   83 f8 06                cmp    $0x6,%eax
   4412b:   75 d3                   jne    44100 <InitCpuFlags+0x100>

Original comment by fbarch...@chromium.org on 17 Mar 2015 at 11:39

GoogleCodeExporter commented 9 years ago
32 bit xgetbv passes with pepper_canary to version 43, revision 321716

Original comment by fbarch...@chromium.org on 24 Mar 2015 at 8:56