Closed GoogleCodeExporter closed 8 years ago
Yes, it should be dst[0] = (s[0] + t[0] + 1) >> 1;
Original comment by fbarch...@google.com
on 4 Jan 2016 at 9:06
ah the scale row functions are based on destination width, not source. here is
that function:
void ScaleRowDown2Box_C(const uint8* src_ptr, ptrdiff_t src_stride,
uint8* dst, int dst_width) {
const uint8* s = src_ptr;
const uint8* t = src_ptr + src_stride;
int x;
for (x = 0; x < dst_width - 1; x += 2) {
dst[0] = (s[0] + s[1] + t[0] + t[1] + 2) >> 2;
dst[1] = (s[2] + s[3] + t[2] + t[3] + 2) >> 2;
dst += 2;
s += 4;
t += 4;
}
if (dst_width & 1) {
dst[0] = (s[0] + s[1] + t[0] + t[1] + 2) >> 2;
}
}
so source width is not known and dealt with a level up, if at all.
If scaling an odd width image, it wont be a scale factor of 2.
e.g. 11 pixels will scale to 5 or 6... up to the caller. neither is a
multiple of 2 scale factor.
Does this issue come up in practice for I420Blend?
Alpha channel is not scaled, but the U and V channels are.
Original comment by fbarch...@google.com
on 4 Jan 2016 at 9:12
Yes, in practice any odd-width images I supply to I420Blend overread 1 column
and produce garbage in the last column of the blend.
The alpha channel is subsampled at planar_functions.cc:727 of I420 Blend:
ScaleRowDown2(alpha, alpha_stride, halfalpha, halfwidth);
If luma/alpha is width 5, halfwidth will be 3, which causes ScaleRowDown2 to
read an extra alpha column. I think instead the last UV column needs to be
blended using the last alpha column by itself:
alpha column 0 + 1 -> uv column 0
alpha column 2 + 3 -> uv column 1
alpha column 4 -> uv column 3
Original comment by dhr...@google.com
on 4 Jan 2016 at 9:19
s/uv column 3/uv column 2/
Original comment by dhr...@google.com
on 4 Jan 2016 at 9:24
For the I420 part, the UV width of an odd width image should be (width + 1) / 2
so that the last column has an alpha.
The problem is scaling alpha of 5 to 3.
When subsampling, I444ToI420, ScalePlane is used
ScalePlane(src_u, src_stride_u, src_uv_width, src_uv_height,
dst_u, dst_stride_u, dst_uv_width, dst_uv_height,
kFilterBilinear);
and it'll scale 5 to 3 with general purpose bilinear scaling.
Its not ideal, as the Y channel and UV are misaligned, but it avoids an
overflow, and is typical of how GPU's do the conversion.
Theres a few solutions, none of them trivial
1. change all scaling functions to use source dimensions instead of destination.
2. ScalePlaneRow() could be written that provides the ScalePlane logic, but row
at a time.
3. branch the function and use source width for ScaleRowDown2Box_AVX2 etc.
4. I420Blend could provide a ScaleRowDown2Box_Odd function or wrapper that
calls ScaleRowDown2Box_SSSE3 then copies the last byte. Or copys the source to
a buffer, duplicates the last byte, then calls ScaleRowDown2Row_SSSE3 on width
+ 1. (e.g. extrudes 5 alpha pixels to 6).
Original comment by fbarch...@google.com
on 4 Jan 2016 at 11:06
Dr Memory catches an odd width failure:
set GYP_DEFINES=build_for_tool=drmemory target_arch=ia32
python build\gyp_chromium -fninja -G msvs_version=2013 --depth=. libyuv_test.gyp
ninja -C out\Debug
set LIBYUV_WIDTH=123
drmemory out\debug\libyuv_unittest.exe --gtest_catch_exceptions=0
--gtest_filter=*I420Blend*
Error #1: UNINITIALIZED READ: reading register ecx
# 0 testing::internal::CmpHelperEQ<>
[d:\src\libyuv\libyuv\testing\gtest\include\gtest\gtest.h:1392]
# 1 testing::internal::EqHelper<>::Compare<>
[d:\src\libyuv\libyuv\testing\gtest\include\gtest\gtest.h:1423]
# 2 libyuv::TestI420Blend
[d:\src\libyuv\libyuv\unit_test\planar_test.cc:1411]
# 3 libyuv::LibYUVPlanarTest_I420Blend_Opt_Test::TestBody
[d:\src\libyuv\libyuv\unit_test\planar_test.cc:1432]
# 4 testing::internal::HandleExceptionsInMethodIfSupported<>
[d:\src\libyuv\libyuv\testing\gtest\src\gtest.cc:2458]
# 5 testing::Test::Run
[d:\src\libyuv\libyuv\testing\gtest\src\gtest.cc:2475]
# 6 testing::TestInfo::Run
[d:\src\libyuv\libyuv\testing\gtest\src\gtest.cc:2656]
# 7 testing::TestCase::Run
[d:\src\libyuv\libyuv\testing\gtest\src\gtest.cc:2774]
# 8 testing::internal::UnitTestImpl::RunAllTests
[d:\src\libyuv\libyuv\testing\gtest\src\gtest.cc:4647]
# 9 testing::internal::HandleExceptionsInMethodIfSupported<>
[d:\src\libyuv\libyuv\testing\gtest\src\gtest.cc:2458]
#10 testing::UnitTest::Run
[d:\src\libyuv\libyuv\testing\gtest\src\gtest.cc:4258]
#11 RUN_ALL_TESTS
[d:\src\libyuv\libyuv\testing\gtest\include\gtest\gtest.h:2237]
#12 main
[d:\src\libyuv\libyuv\unit_test\unit_test.cc:354]
Note: @0:00:03.558 in thread 4012
Note: instruction: cmp %ecx %eax
Original comment by fbarch...@google.com
on 5 Jan 2016 at 7:35
The following revision refers to this bug:
https://chromium.googlesource.com/libyuv/libyuv.git/+/fc52d8ded269e9cd40c7a763e36758a08f177da0
commit fc52d8ded269e9cd40c7a763e36758a08f177da0
Author: Frank Barchard <fbarchard@google.com>
Date: Wed Jan 06 23:12:17 2016
Odd width variation of scale down by 2 for subsampling
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:538
Review URL: https://codereview.chromium.org/1558093003 .
[modify]
http://crrev.com/fc52d8ded269e9cd40c7a763e36758a08f177da0/README.chromium
[modify]
http://crrev.com/fc52d8ded269e9cd40c7a763e36758a08f177da0/include/libyuv/scale_r
ow.h
[modify]
http://crrev.com/fc52d8ded269e9cd40c7a763e36758a08f177da0/include/libyuv/version
.h
[modify]
http://crrev.com/fc52d8ded269e9cd40c7a763e36758a08f177da0/source/planar_function
s.cc
[modify]
http://crrev.com/fc52d8ded269e9cd40c7a763e36758a08f177da0/source/scale_any.cc
[modify]
http://crrev.com/fc52d8ded269e9cd40c7a763e36758a08f177da0/source/scale_common.cc
Original comment by bugdroid1@chromium.org
on 6 Jan 2016 at 11:13
Odd width profile shows performance as expected. C code is minor. Scaling is
AVX2 and a little slow but mostly due to memory fetching of 2 rows.
Samples: 2K of event 'cycles', Event count (approx.): 1991504782
76.46% libyuv_unittest libyuv_unittest [.] BlendPlaneRow_AVX2
15.69% libyuv_unittest libyuv_unittest [.] ScaleRowDown2Box_AVX2
3.39% libyuv_unittest libyuv_unittest [.] libyuv::TestI420Blend(int, int, int, int, int, int, int)
2.19% libyuv_unittest libyuv_unittest [.] ScaleRowDown2Box_Odd_C
1.12% libyuv_unittest [kernel.kallsyms] [k] 0xffffffff81158a40
0.51% libyuv_unittest libyuv_unittest [.] I420Blend
0.19% libyuv_unittest libyuv_unittest [.] testing::AssertionSuccess()
0.13% libyuv_unittest libyuv_unittest [.] ScaleRowDown2Box_Odd_AVX2
0.09% libyuv_unittest libyuv_unittest [.] BlendPlaneRow_C
Original comment by fbarch...@google.com
on 7 Jan 2016 at 12:03
Original comment by fbarch...@google.com
on 7 Jan 2016 at 6:30
Original issue reported on code.google.com by
dhr...@google.com
on 4 Jan 2016 at 3:33