Closed mikeversteeg closed 3 years ago
Hi!
In this case I would recommend to use consecutive call of function ReduceGray2x2 to fast reducing of Y, U and V planes. At final stage use function ResizeArea. And only then use Yuv444pToBgr to conversion. Although this method uses additional buffer it achievs maximal performance.
But the additional memory access is a limiting factor in this case. Resizing while converting would be much faster as there would only be a single memory read and write.
Yes of course the memory access is a limiting factor. But call of ReduceGray2x2 reduces itone in 4 times. We use this method in our video analytics pipeline to resize Full HD Yuv420p to smaler RGB (480x270, for example). Mixing of resizing and conversion in one algorithm is too complicated and does not give any sufficent performance gain.
It may be useful to create a class which implement this behaviour (with name something like YuvToThumbnail).
Avoiding the additional memory access will improve performance, I have already established this. Note display can still be full size, although typically it will be around 25%. For a dozen streams that's around 0.5 GB/s for each single memory write. It is easy to exceed memory bandwidth..
If temporary buffer size is lesser than L3 cache size / thread count that this is not important. See matrix multiplication algorithm (it wide uses data reordering into temporal buffer to achieve of performance maximum).
As I see the main bottleneck is loading of source image YUV420P (80% of memory access). Subsequent loading gives about 20% and lesser if buffer fits into L3 cache).
YUV420P 1920x1080 has size about 3 MB, temporary buffer is lesser than 1 MB it is lesser than L3 cache per thread).
Barely, at 50% size it already exceeds the L3 cache size on e.g. a 8 core Xeon 2286m. But you're also assuming your thread isn't interrupted, which you cannot guarantee AFAIK? My app runs hundreds of threads so there is a lot of context switching going on..
Anyway, I can only ask and appreciate the work you do.
I do not fully understand the documentation, but I understand you say that
is faster (and presumably better) than
correct? I would not expect that.
Would have loved to try it out but unfortunately ResizeArea is not in SimdLib.h which I have been using. I updated not too long ago.
How does SimdResizeGray2x2 compare? I don't know how it works, the documentation is minimal.
The method described above is used for resizing from big YUV420P to small BGR with a purpose of objects detection or recognition. It gives maximal correct reduced image. If the quality of reduced image is not so important you of course may use you method and it will be faster. The first note: I would recommend to resize Y, U and V planes to the same size and use then frunction SimdYuv444pToBgr. The second note: if reduce coefficient is to large then function ResizeBilinear gives result close to ResizeNearest (it is not imlemented). Itone can have poor quality.
Thank you. Indeed ResizeBilinear gives distorted pixels so I cannot use it, I need something else (that is available in SimdLib and fast). Any ideas? SimdStretchGray2x2?
I am not familiar with the word "ltone", what does it mean?
You should add a Donate button, thanks for the quick replies and assistance!
Simd::Resize with parameter SimdResizeMethodArea gives the best result.
"itone" - is a my mistake I meant "this one".
I had donate "button" at my past project (AntiDupl) and it gave about 37 dollars for 13 years. I scare to add this button - I can't carry the weight of so big amount of cash :)
Can you give me an email address for a PayPal payment? I always donate for free software, helps me sleep.
Thanks. I very appreciate that you want to donate to my project. It is not mean that I don't appreciate money but I thing that the best donate for open source project is a public mention about it. Unless, of course, this makes it difficult for you.
Your support goes above and beyond what is required for an open source project. Nonetheless I thankfully added it to my Credits section (http://help.vidblasterx6.com/CreditsDisclaimer.html).
Thanks!
Regarding SimdResizerRun, I am not sure what these channels are, are these 3 last parameters correct?
pResizeContext = SimdResizerInit(widthin, heightin, widthout, heightout, 1, SimdResizeChannelByte, SimdResizeMethodArea);
Yes, its correct parameters for resizing of Y, U and V planes.
Hi!
Still loving this library, it is really impressive coding..
I need to display YUV420P images as thumbnails on screen, which means they need to be converted to RGB and resized. Because this is HD video, speed is essential. Currently I first resize each of the YUV planes, and then convert to RGB. However this means an additional memory write of the (smaller) YUV image. This can be avoiding by dropping the demand that for SimdYuv420pToBgr both images must have same size. Can this be added? As speed is important, a simple pixel drop can be used (although if it can be added efficiently, a basic (bi)lineair interpolation would be great).
Thanks for considering.