Closed GoogleCodeExporter closed 8 years ago
Good point. Thanks for pointing that out.
I incorrectly understood "restrict" keyword as being a declaration of intention
regarding "write operation" within a defined memory zone.
In this case, 'op' is the only pointer authorized to write into output buffer,
so it would have been right.
But reading again the definition, i find that the declaration of intention is
much broader, and applies to any access to the memory zone. In this case, it's
wrong since 'ref' is also authorized to "read" the output buffer.
It's unfortunate i can not reproduce the issue in order to witness and study it.
Original comment by yann.col...@gmail.com
on 24 Sep 2012 at 4:20
I'll look into it.
Original comment by yann.col...@gmail.com
on 24 Sep 2012 at 4:20
I use LZ4 embedded and slightly modified, so I cannot give you instructions for
reproducing this right away. But I'll try to find a case for you.
Anyway, I didn't checked the assembly, but with both op and ref marked as
restrict this:
*op++ = *ref++;
*op++ = *ref++;
*op++ = *ref++;
*op++ = *ref++;
is probably optimized to something like:
*(int*)op = *(int*)ref;
op += 4;
ref += 4;
Original comment by ala.lusz...@gmail.com
on 24 Sep 2012 at 5:00
I managed to get a reproducible case. However, note that there's an #ifdef
which incorrectly identifies icc as not knowing the restrict keyword, which
overshadows the issue.
[ala@intruder lz4]$ cat /proc/cpuinfo | grep "model name" | head -n 1
model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
[ala@intruder lz4]$ icc --version
icc (ICC) 11.0 20081105
Copyright (C) 1985-2008 Intel Corporation. All rights reserved.
[ala@intruder lz4]$ svn diff
Index: Makefile
===================================================================
--- Makefile (revision 68)
+++ Makefile (working copy)
@@ -11,10 +11,10 @@
all: lz4demo64 lz4demo32
lz4demo64: lz4.c lz4.h lz4hc.c lz4hc.h bench.c lz4demo.c
- gcc -O3 -I. -std=c99 -Wall -W -Wundef -Wno-implicit-function-declaration
lz4hc.c lz4.c bench.c lz4demo.c -o $(OUTPUT64)
+ icc -O3 -restrict lz4hc.c lz4.c bench.c lz4demo.c -o $(OUTPUT64)
lz4demo32: lz4.c lz4.h lz4hc.c lz4hc.h bench.c lz4demo.c
- gcc -m32 -Os -march=native -I. -std=c99 -Wall -W -Wundef
-Wno-implicit-function-declaration lz4hc.c lz4.c bench.c lz4demo.c -o
$(OUTPUT32)
+ icc -O3 -restrict lz4hc.c lz4.c bench.c lz4demo.c -o $(OUTPUT32)
clean:
rm -f core *.o $(OUTPUT32) $(OUTPUT64)
Index: lz4.c
===================================================================
--- lz4.c (revision 68)
+++ lz4.c (working copy)
@@ -98,12 +98,6 @@
//**************************************
// Compiler Options
//**************************************
-#if __STDC_VERSION__ >= 199901L // C99
-/* "restrict" is a known keyword */
-#else
-# define restrict // Disable restrict
-#endif
-
#define GCC_VERSION (__GNUC__ * 100 + __GNUC_MINOR__)
#ifdef _MSC_VER // Visual Studio
[ala@intruder lz4]$ cat a
aAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaA
aAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaA
aAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaAaA
[ala@intruder lz4]$ ./lz4demo64 -c0 a a-compressed
*** Compression CLI using LZ4 algorithm , by Yann Collet (Sep 25 2012) ***
Compressed 203 bytes into 20 bytes ==> 9.85%
Done in 0.00 s ==> inf MB/s
[ala@intruder lz4]$ ./lz4demo64 -d a-compressed a-uncompressed
*** Compression CLI using LZ4 algorithm , by Yann Collet (Sep 25 2012) ***
Successfully decoded 203 bytes
Done in 0.00 s ==> inf MB/s
[ala@intruder lz4]$ cat a-uncompressed
aAaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAa
aAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaaAaA
Original comment by ala.lusz...@gmail.com
on 25 Sep 2012 at 9:02
Hi
Many thanks for detailed instructions on how to re-create the issue.
I currently do not have icc available around, and therefore cannot use your
information yet. That's probably something i will have to improve in the future.
Nonethless, I've been investigating this long overdue issue lately.
I've followed your advise of removing the "restrict" keyword.
Initial benchmarks show that performance difference is very small, for Intel
CPU.
The resulting source file is in attached file (release candidate 80)
Regards
Original comment by yann.col...@gmail.com
on 24 Oct 2012 at 3:52
Attachments:
Corrected into r80.
Original comment by yann.col...@gmail.com
on 26 Oct 2012 at 12:37
Original issue reported on code.google.com by
ala.lusz...@gmail.com
on 24 Sep 2012 at 2:20