Open Jony-2018 opened 4 months ago
這個直接貼GPT就解決了
Jony @.***> 於 2024年6月19日 週三 下午6:37寫道:
default.png (view on web) https://github.com/DennisLiu1993/Fastest_Image_Pattern_Matching/assets/51848340/dec6f37c-690b-41e6-8c17-ece590b7ec0a
— Reply to this email directly, view it on GitHub https://github.com/DennisLiu1993/Fastest_Image_Pattern_Matching/issues/58, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY7JBQYQOC46BVZRZJJSREDZIFNPFAVCNFSM6AAAAABJRZD6UWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3DEMBQGQZDKOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
感谢大佬抽空回复,我昨天找了GPT,他让我 将代码块修改成:
int16x8_t SrcK_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcK));
int16x8_t SrcK_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcK));
int16x8_t SrcC_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcC));
int16x8_t SrcC_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcC));
就能正常编译了,但是运行结果有问题,暂时不知道哪里的值需要更改 运行后粗匹配的iMatchSize和在Windows上的数量是一样的,但是最后输出的结果是空,而在Windows上输出的结果是匹配到3个目标 请问您有思路吗?
inline int32_t neon_hsum_epi32(int32x4_t V) { int32x2_t SumV = vadd_s32(vget_low_s32(V), vget_high_s32(V)); SumV = vpadd_s32(SumV, SumV); return vget_lane_s32(SumV, 0); }
inline int32_t neon_haddw_s32(int16x8_t V) { int32x4_t SumV = vpaddlq_s16(V); SumV = vaddq_s32(SumV, vextq_s32(SumV, SumV, 1)); // Optional: Enable for summing all 4 lanes return neon_hsum_epi32(SumV); }
inline int IM_Conv_SIMD(unsigned char pCharKernel, unsigned char pCharConv, int iLength) { const int iBlockSize = 16, Block = iLength / iBlockSize; int32x4_t SumV = vdupq_n_s32(0); uint8x16_t Zero = vdupq_n_u8(0);
for (int Y = 0; Y < Block * iBlockSize; Y += iBlockSize) { uint8x16_t SrcK = vld1q_u8(pCharKernel + Y); uint8x16_t SrcC = vld1q_u8(pCharConv + Y); int16x8_t SrcK_L = vmovl_u8(vget_low_u8(SrcK)); int16x8_t SrcK_H = vmovl_u8(vget_high_u8(SrcK)); int16x8_t SrcC_L = vmovl_u8(vget_low_u8(SrcC)); int16x8_t SrcC_H = vmovl_u8(vget_high_u8(SrcC));
int32x4_t MulLow = vmull_s16(vget_low_s16(SrcK_L), vget_low_s16(SrcC_L)); int32x4_t MulHigh = vmull_s16(vget_high_s16(SrcK_L), vget_high_s16(SrcC_L)); int32x4_t SumT = vaddq_s32(MulLow, MulHigh);
MulLow = vmull_s16(vget_low_s16(SrcK_H), vget_low_s16(SrcC_H)); MulHigh = vmull_s16(vget_high_s16(SrcK_H), vget_high_s16(SrcC_H)); SumT = vaddq_s32(SumT, vaddq_s32(MulLow, MulHigh));
SumV = vaddq_s32(SumV, SumT); }
int32_t Sum = neon_hsum_epi32(SumV);
for (int Y = Block iBlockSize; Y < iLength; Y++) { Sum += pCharKernel[Y] pCharConv[Y]; }
return Sum; } 試試這段?
Jony @.***> 於 2024年6月20日 週四 下午4:24寫道:
感谢大佬抽空回复,我昨天找了GPT,他让我 将代码块修改成:
int16x8_t SrcK_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcK)); int16x8_t SrcK_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcK)); int16x8_t SrcC_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcC)); int16x8_t SrcC_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcC));
就能正常编译了,但是运行结果有问题,暂时不知道哪里的值需要更改 运行后粗匹配的iMatchSize和在Windows上的数量是一样的,但是最后输出的结果是空,而在Windows上输出的结果是匹配到3个目标 请问您有思路吗?
— Reply to this email directly, view it on GitHub https://github.com/DennisLiu1993/Fastest_Image_Pattern_Matching/issues/58#issuecomment-2180105153, or unsubscribe https://github.com/notifications/unsubscribe-auth/AY7JBQYBG3FVEOPWMZA2NS3ZIKGUNAVCNFSM6AAAAABJRZD6UWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQGEYDKMJVGM . You are receiving this because you commented.Message ID: @.*** com>
在您的基础上修改
int16x8_t SrcK_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcK));
int16x8_t SrcK_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcK));
int16x8_t SrcC_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcC));
int16x8_t SrcC_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcC));
成功编译并结果正常,感谢大佬回复和帮助!