Open ronanbel opened 2 years ago
Thank you - this is great. I normally shy away from merging code that I can't easily maintain, but let me see what I can do. How much does this help encoding perf.?
Hi Richard. I did a second submit today (there was an error in the first one) with additional optimization (uastc_unpack) on my test device (arm a53), it lasts 200ms from 340ms (base code) to transcode the uastc to astc (the same on an A57) it's also faster on PC I'm using a big texture for the tests, do a CRC32 at the end to detect if the decoding gives the same result. If you need some other optims or modifications, don't hesitate to let me know. And if you have a unit-test asset (a test.basis) which has all modes & subsets & anchor patterns, it would help me to track my modifications are valid
have a nice day
Le jeu. 12 mai 2022 à 09:00, Rich Geldreich @.***> a écrit :
Thank you - this is great. I normally shy away from merging code that I can't easily maintain, but let me see what I can do. How much does this help encoding perf.?
— Reply to this email directly, view it on GitHub https://github.com/BinomialLLC/basis_universal/pull/298#issuecomment-1124606814, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOL7CCQAODQFKKYWU5QX2LVJSUBBANCNFSM5VIMSKDQ . You are receiving this because you authored the thread.Message ID: @.***>
re-hi (from home after a long day at work)
BASISD_USE_UNALIGNED_WORD_READS)
here are the switches to enable most of my modifications (some like a hint for the compiler to do a fast div3/5 are not between #if) I'm using this version of the code on the project I'm working on, at least on this dataset, all is fine. I may do some other modifications (if needed, if I have some spare time because I did most of this during the week-end) I understand what you mean by "I can't easily maintain", I would do the same. All I can say is that I will continue to work in the game industry for at least 15/20 years (probably not at Ubisoft), so I think you can ask me for maintenance for the next decade :) (anyway, you may leave this between #if defined and leave the user the ability to use the simd version if he needs some speedup) (there is not so much simd code, most added lines are here to setup tables for further simd processing)
Le jeu. 12 mai 2022 à 20:06, ronan bel @.***> a écrit :
Hi Richard. I did a second submit today (there was an error in the first one) with additional optimization (uastc_unpack) on my test device (arm a53), it lasts 200ms from 340ms (base code) to transcode the uastc to astc (the same on an A57) it's also faster on PC I'm using a big texture for the tests, do a CRC32 at the end to detect if the decoding gives the same result. If you need some other optims or modifications, don't hesitate to let me know. And if you have a unit-test asset (a test.basis) which has all modes & subsets & anchor patterns, it would help me to track my modifications are valid
have a nice day
Le jeu. 12 mai 2022 à 09:00, Rich Geldreich @.***> a écrit :
Thank you - this is great. I normally shy away from merging code that I can't easily maintain, but let me see what I can do. How much does this help encoding perf.?
— Reply to this email directly, view it on GitHub https://github.com/BinomialLLC/basis_universal/pull/298#issuecomment-1124606814, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOL7CCQAODQFKKYWU5QX2LVJSUBBANCNFSM5VIMSKDQ . You are receiving this because you authored the thread.Message ID: @.***>
How much does this help encoding perf sorry I didn't answer your questions (12h day at work today, I'm tired) I don't think it helps the encoder it's only in the transcoder one part in the uastc_unpack one part in the transcoding to astc (the one I needed) all is for the runtime client code (parsing uastc, then converting the astc)
Le jeu. 12 mai 2022 à 20:51, ronan bel @.***> a écrit :
re-hi (from home after a long day at work)
define BASISD_ASTC_SIMD // 287 => 240
define BASISD_UASTC_SIMD // 240 => 222 (231 without
BASISD_USE_UNALIGNED_WORD_READS)
define BASISD_ASTC_24WRITE
here are the switches to enable most of my modifications (some like a hint for the compiler to do a fast div3/5 are not between #if) I'm using this version of the code on the project I'm working on, at least on this dataset, all is fine. I may do some other modifications (if needed, if I have some spare time because I did most of this during the week-end) I understand what you mean by "I can't easily maintain", I would do the same. All I can say is that I will continue to work in the game industry for at least 15/20 years (probably not at Ubisoft), so I think you can ask me for maintenance for the next decade :) (anyway, you may leave this between #if defined and leave the user the ability to use the simd version if he needs some speedup) (there is not so much simd code, most added lines are here to setup tables for further simd processing)
Le jeu. 12 mai 2022 à 20:06, ronan bel @.***> a écrit :
Hi Richard. I did a second submit today (there was an error in the first one) with additional optimization (uastc_unpack) on my test device (arm a53), it lasts 200ms from 340ms (base code) to transcode the uastc to astc (the same on an A57) it's also faster on PC I'm using a big texture for the tests, do a CRC32 at the end to detect if the decoding gives the same result. If you need some other optims or modifications, don't hesitate to let me know. And if you have a unit-test asset (a test.basis) which has all modes & subsets & anchor patterns, it would help me to track my modifications are valid
have a nice day
Le jeu. 12 mai 2022 à 09:00, Rich Geldreich @.***> a écrit :
Thank you - this is great. I normally shy away from merging code that I can't easily maintain, but let me see what I can do. How much does this help encoding perf.?
— Reply to this email directly, view it on GitHub https://github.com/BinomialLLC/basis_universal/pull/298#issuecomment-1124606814, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOL7CCQAODQFKKYWU5QX2LVJSUBBANCNFSM5VIMSKDQ . You are receiving this because you authored the thread.Message ID: @.***>
Hi, yes you can do a cast on the shift(always left) to satisfy GCC and you can replace the vdup_n_u8(0) by a vdup_16(0), logically it's the same (I'm only using clang & msvc) (sorry I don't have git on this PC)
Le lun. 10 avr. 2023 à 05:26, Andrei Alexeyev @.***> a écrit :
@.**** commented on this pull request.
In transcoder/basisu_transcoder.cpp https://github.com/BinomialLLC/basis_universal/pull/298#discussion_r1161405354 :
- uint16x8_t bitMask0 = vshlq_u16( vdupq_n_u16(1), bitNum0 ); // bitMask = (1U << n) - 1U
- uint16x8_t bitMask1 = vshlq_u16( vdupq_n_u16(1), bitNum1 ); // bitMask = (1U << n) - 1U
This fails on GCC without -flax-vector-conversions:
../subprojects/basis_universal/transcoder/basisu_transcoder.cpp: In function 'void basist::pack_astc_block_weights(uint8_t, const uint8_t, int, int)': ../subprojects/basis_universal/transcoder/basisu_transcoder.cpp:12010:80: note: use '-flax-vector-conversions' to permit conversions between vectors with differing element types or numbers of subparts 12010 uint8x8_t rev8 = vqmovn_u16( vcombine_u16( rev8lohi, vdup_n_u8(0) ) ); // 8bits in 4 u8 (clear lower 32) ~~^~~~~~~~ ../subprojects/basis_universal/transcoder/basisu_transcoder.cpp:12010:101: error: cannot convert 'uint8x8_t' to 'uint16x4_t' 12010uint8x8_t rev8 = vqmovn_u16( vcombine_u16( rev8lohi, vdup_n_u8(0) ) ); // 8bits in 4 u8 (clear lower 32) ~~~^uint8x8_t An explicit cast fixes it: ⬇️ Suggested change
- uint16x8_t bitMask0 = vshlq_u16( vdupq_n_u16(1), bitNum0 ); // bitMask = (1U << n) - 1U
- uint16x8_t bitMask1 = vshlq_u16( vdupq_n_u16(1), bitNum1 ); // bitMask = (1U << n) - 1U
- uint16x8_t bitMask0 = vshlq_u16( vdupq_n_u16(1), (int16x8_t)bitNum0 ); // bitMask = (1U << n) - 1U
- uint16x8_t bitMask1 = vshlq_u16( vdupq_n_u16(1), (int16x8_t)bitNum1 ); // bitMask = (1U << n) - 1U
— Reply to this email directly, view it on GitHub https://github.com/BinomialLLC/basis_universal/pull/298#pullrequestreview-1377041364, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOL7CARDZELDN4VTHPMGJDXAN4YBANCNFSM5VIMSKDQ . You are receiving this because you authored the thread.Message ID: @.***>
ssse3 (I5 6300) : 163 => 136 ms arm (A53) : 340 => 282 ms I moved the block weight transform code in a single function : pack_astc_block_weights you can enable/disable the SIMD code with a define BASISD_ASTC_SIMD All the simd code is annotated
tested x86_64 on windows, compiled with VS2019 and clang 11 tested arm & arm64 on android, compiled with latest NDK (clang11)
if needed, you can get in touch at : ronan.bel@gmail.com ronan.bel@ubisoft.com