animetosho / ParPar

High performance PAR2 create client for NodeJS
202 stars 21 forks source link

"MD5_SIMD_NUM == 2" may fail #28

Closed Yutaka-Sawada closed 3 years ago

Yutaka-Sawada commented 3 years ago

I refered ParPar's MD5 code for optimization. I could improve my MD5 function. Thank you.

But, I found a strange point in your code. Though it's not used actually now, it will cause a problem someday. The file is "md5-sse2.c". The macro under line 99.

X(a) = _mm_loadl_epi64(data0++); X(b) = _mm_loadl_epi64(data1++);

These lines read 8-bytes each from data0 and data1, and proceed 16-bytes them. While I'm not sure the data alignment, it may loss 8-bytes between calling the macro. I thought that adding one more variable as offset might be good.

animetosho commented 3 years ago

Thanks for pointing that out!

I think I originally thought about using MMX for width=2, but dropped the idea as there's little point these days (and Intel is slowing down MMX on their processors anyway).

But the code is crap and I'm going to abandon it. It's being replaced with the stuff in the gf16/md5x2-* files.