VQEG / siti-tools

SI TI calculation tools
MIT License
39 stars 7 forks source link

Error that depends on the video resolution #17

Open pmico opened 1 year ago

pmico commented 1 year ago

First of all, I would like to thank you very much for making this tool available to us. Until recently, I have been using the previous siti tool which has become obsolete due to the changes in ITU-T Rec. P.910.

I have tried using siti-tools as I did with the previous siti package, and I have encountered some errors that did not appear before.

Below, I will list all the steps I have followed in case you want to reproduce them:

1) I download a 4K video from the following address (from an open database): wget https://s3-us-west-2.amazonaws.com/cl-4kvideos-public/Eldorado.mov

This is a very large 3840x2160 video (aspect 16:9). Of course, any other has to be valid too.

2) From that video, I create 2-second segments at different resolutions (to further simplify things, I will provide commands to create a single segment for each resolution): ffmpeg -t 2 -i Eldorado.mov -vf scale=1920:1080 -sws_flags lanczos -an -c:v libx264 -crf 0 -pix_fmt yuv420p Eldorado2s_1080.mp4 ffmpeg -t 2 -i Eldorado.mov -vf scale=1280:720 -sws_flags lanczos -an -c:v libx264 -crf 0 -pix_fmt yuv420p Eldorado2s_720.mp4 ffmpeg -t 2 -i Eldorado.mov -vf scale=854:480 -sws_flags lanczos -an -c:v libx264 -crf 0 -pix_fmt yuv420p Eldorado2s_480.mp4 ffmpeg -t 2 -i Eldorado.mov -vf scale=640:360 -sws_flags lanczos -an -c:v libx264 -crf 0 -pix_fmt yuv420p Eldorado2s_360.mp4 ffmpeg -t 2 -i Eldorado.mov -vf scale=426:240 -sws_flags lanczos -an -c:v libx264 -crf 0 -pix_fmt yuv420p Eldorado2s_240.mp4

3) I use the siti-tools to obtain the SI and TI for each segment with the following command (the only required option is -r for full-range - the output is initially redirected to a file). I write only the command used for 240 resolution: siti-tools -r full Eldorado2s_240.mp4

Everything works fine for resolutions 1080, 720, and 360, but it throws an error for 480 and 240.

For example, in the case of 240, the error is:

siti-tools --legacy Eldorado2s_240.mp4 
0it [00:00, ?it/s]Traceback (most recent call last):
  File "/Users/xxx/miniforge3/envs/mlpexp/lib/python3.10/site-packages/siti_tools/file.py", line 75, in read_container
    .reshape(frame.height, frame.width).astype("int")
ValueError: cannot reshape array of size 107520 into shape (240,426)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/xxx/miniforge3/envs/mlpexp/bin/siti-tools", line 8, in <module>
    sys.exit(main())
  File "/Users/xxx/miniforge3/envs/mlpexp/lib/python3.10/site-packages/siti_tools/__main__.py", line 276, in main
    si_ti_calculator.calculate(
  File "/Users/xxx/miniforge3/envs/mlpexp/lib/python3.10/site-packages/siti_tools/siti.py", line 596, in calculate
    for frame_data in read_container(input_file):
  File "/Users/xxx/miniforge3/envs/mlpexp/lib/python3.10/site-packages/siti_tools/file.py", line 78, in read_container
    raise RuntimeError(
RuntimeError: Cannot decode frame. Have you specified the bit depth correctly? Original error: cannot reshape array of size 107520 into shape (240,426)
0it [00:00, ?it/s]

The error occurs because the vector received by siti-tools from the av package has a size of 107,520 bytes and cannot be resized to (240, 426).

Version of ffmpeg used is 6.0 on macOS.

This issue doesn't occur with the old siti package, but I am unable to find the reason why this happens. I would greatly appreciate any indication that could help me in this regard. If you need any additional information or if you want me to perform any other type of test, please don't hesitate to let me know.

Once again, thank you very much for all your fantastic work.

slhck commented 1 year ago

Thanks for the pointer! For the previous version of SI/TI we (or rather, I) simply used av to get the frame data in gray format:

https://github.com/slhck/siti/blob/1421630fd6d5985114417195d39401bf6d30ebf2/siti/__main__.py#L105-L111

I believe this has always worked. The problem was that this wouldn't work for all formats of data, and gray format conversion introduced changes in the values.

Now we extract the Y plane differently, and apparently that produces unexpected data:

https://github.com/VQEG/siti-tools/blob/06f6f89dc7b9e688d89961ddfe420c47efaa2521/siti_tools/file.py#L77-L78

I am not sure why this happens though. Maybe the data type is incorrect? I think we will have to check with av on how this could be handled best.

slhck commented 1 year ago

Did some digging and it seems that (maybe for performance reasons?) PyAV uses padding on frames, which gives incorrect sizes of the numpy arrays when attempting to reshape them with the known frame width/height.

There's some discussion here: https://github.com/PyAV-Org/PyAV/discussions/851

I will check what can be done to avoid this, or at least ignore the padding when getting out the Y plane data.

slhck commented 1 year ago

I updated the test_decode.py function (which basically uses the same code as in the main package) and ran it with videos of different pixel sizes (like yours), but it worked fine:

➜ cd test
➜ ./generate_ffmpeg_sources.sh
➜ python3 test_decode.py
videos/test-240p.y4m
frame 0 (426x240) -- min: 0, max: 255

videos/test-720p.y4m
frame 0 (1280x720) -- min: 0, max: 255

videos/test-480p.y4m
frame 0 (854x480) -- min: 0, max: 255

videos/test-360p.y4m
frame 0 (640x360) -- min: 0, max: 255

videos/test-1080p.y4m
frame 0 (1920x1080) -- min: 0, max: 255

This is using ffmpeg 6.0 (Homebrew) with PyAV 10.0.0 under macOS running Python 3.11.

I will download your test video and try that too.

pmico commented 1 year ago

Thank you very much for responding so quickly and diving right into the problem.

At first, I also thought it was something related to padding in frames because if you take the size 107,520 and divide it by 240, you get an exact value of 448. With the downscaling to 480 (854x480), the error throws a resulting value of 414,720, and if you divide it by 480, you get 864. These are always whole numbers. But I haven't been able to find yet in the AV library where the frames are created to see if padding is actually done.

Taking into account the tests you have conducted, which haven't resulted in an error, I thought that perhaps the issue could be with my video (I also see that you have used the same version of FFmpeg, and I have verified that the version of pyAV I have installed is 10.0.0; the only difference is the Python version, but it cannot be the problem). So I have tried with other videos from different databases just in case it had something to do with the videos themselves or the containers used (I used an ultravideofi database y4m video, a TVD database mp4, an MCMLK database yuv, and a UGC YouTube database mkv - all videos are 4k and lossless), but I always get the same result.

I still believe that the issue must lie within the AV library, and tomorrow, I will try to find the code where the padding is done to understand the reason behind it and maybe I will discover why it doesn't always perform padding, as it is evident that it doesn't do it with 1080, 720, and 360 resolutions, and it doesn't result in an error. Another difference I see between those resolutions and the 240 and 480 resolutions is that the former have a SAR of 1:1 and a DAR of 16:9, while the latter have a DAR of 16:9 but a different SAR from 1:1. I'm not sure if that will have any influence

Thank you very much.

El 27 jun 2023, a las 20:58, Werner Robitza @.***> escribió:

I updated the test_decode.py function (which basically uses the same code as in the main package) and ran it with videos of different pixel sizes (like yours), but it worked fine:

➜ cd test ➜ ./generate_ffmpeg_sources.sh ➜ python3 test_decode.py videos/test-240p.y4m frame 0 (426x240) -- min: 0, max: 255

videos/test-720p.y4m frame 0 (1280x720) -- min: 0, max: 255

videos/test-480p.y4m frame 0 (854x480) -- min: 0, max: 255

videos/test-360p.y4m frame 0 (640x360) -- min: 0, max: 255

videos/test-1080p.y4m frame 0 (1920x1080) -- min: 0, max: 255 This is using ffmpeg 6.0 (Homebrew) with PyAV 10.0.0 under macOS running Python 3.11.

I will download your test video and try that too.

— Reply to this email directly, view it on GitHub https://github.com/VQEG/siti-tools/issues/17#issuecomment-1610057748, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTQLU7PL4Z24NFU7HPLS7TXNMUPBANCNFSM6AAAAAAZV6X5Y4. You are receiving this because you authored the thread.

pmico commented 1 year ago

Additional information that I have discovered:

In PyAV, when creating the planes, the following code is executed (VideoPlane class):

https://github.com/PyAV-Org/PyAV/blob/6982d818080716b39bc3f333cf19daa0a3e0b56a/av/video/plane.pyx#L24C1-L27C82

A buffer is created to hold the data, and its size depends on the height and the variable 'linesize'. While debugging the code during the loading of the 240 video, it turns out that this variable contains the value 448 (which is the result of dividing the plane size, 107,520, by 240 as I mentioned in my previous message).

The variable is related to VideoFrame clas):

https://github.com/PyAV-Org/PyAV/blob/6982d818080716b39bc3f333cf19daa0a3e0b56a/av/video/frame.pyx#L99-L107

And lib.av_image_alloc function is declared in include/libavutil/avutil.pxd file:

https://github.com/PyAV-Org/PyAV/blob/6982d818080716b39bc3f333cf19daa0a3e0b56a/include/libavutil/avutil.pxd#L271-L280

but it does not seem to be the functions that assign values to line size.

So, I suppose the problem is related to this variable but for now, I have not yet found where values are assigned to this array.

Just one more thing. The calculation of line size seems to be: linesize = ceil(width / 32) * 32

So for 240: line size = ceil(426/32) * 32 = ceil(13.3125) * 32 = 14 * 32 = 448 And for 480: line size = ceil(854/32) * 32 = ceil(26.6875) * 32 = 27 * 32 = 864

slhck commented 1 year ago

Thanks for digging into the code. For removing the padding, we could try this suggestion: https://github.com/PyAV-Org/PyAV/discussions/851#discussioncomment-2263992

I don't have time yet to try this, maybe some time in the next few days, unless you want to give it a shot.

pmico commented 1 year ago

Ok it works !!!

I have adapted the suggestion to the file 'file.py' and did the following modifications:

        try:
            buf_width = frame.planes[0].line_size
            bytes_per_pixel = 1
            frame_width = frame.planes[0].width * bytes_per_pixel
            arr = np.frombuffer(frame.planes[0], np.uint8)
            if buf_width != frame_width:
                arr = arr.reshape(-1, buf_width)[:, :frame_width]

            yield (
                # The code commented out below does the "standard" conversion of YUV
                # to grey, using weighting, but it does not actually use the correct
                # luminance-only Y values.
                # frame.to_ndarray(format="gray")

                # choose the Y plane (the first one)
                #np.frombuffer(frame.planes[0], datatype)
                #.reshape(frame.height, frame.width).astype("int")
                np.ascontiguousarray(arr.reshape(-1, frame_width))
            )
        except ValueError as e:
            raise RuntimeError(
                f"Cannot decode frame. Have you specified the bit depth correctly? Original error: {e}"
            )

Now, when I execute site-tools for 240 or 480 resolutions, no errors are thrown (and it still works fine with 1080 resolution, of course):

Results with 240 resolution:

{
    "si": [
        3.071003101970454,
        5.782074861082161,
        8.877488730999156,
        [...]
        60.234250624216955,
        61.265056943676996,
        60.47859319210376
    ],
    "ti": [
        2.418967045161408,
        4.120951474484814,
        6.298584960972571,
        [...]
        33.10909897789442,
        32.744271079886,
        33.700397970592945
    ],
    "settings": {
        "calculation_domain": "pq",
        "hdr_mode": "sdr",
        "bit_depth": 8,
        "color_range": "full",
        "eotf_function": "bt1886",
        "l_max": 300,
        "l_min": 0.1,
        "gamma": 2.4,
        "pu21_mode": "banding",
        "legacy": false,
        "version": "0.2.1"
    },
    "input_file": "Eldorado2s_240.mp4"
}

With 480 resolution:

{
    "si": [
        4.146357911200241,
        5.357785420734201,
        6.787866744206202,
        [...]
        39.05428395415095,
        39.68767282981164,
        39.58748449741882
    ],
    "ti": [
        2.4159979107205856,
        4.124670503341642,
        6.307148973403055,
        [...]
        33.129330845026715,
        32.76442308683909,
        33.723721230520646
    ],
    "settings": {
        "calculation_domain": "pq",
        "hdr_mode": "sdr",
        "bit_depth": 8,
        "color_range": "full",
        "eotf_function": "bt1886",
        "l_max": 300,
        "l_min": 0.1,
        "gamma": 2.4,
        "pu21_mode": "banding",
        "legacy": false,
        "version": "0.2.1"
    },
    "input_file": "Eldorado2s_480.mp4"
}

With 1080 resolution:

{
    "si": [
        4.565459877889116,
        5.066738860630213,
        5.346033534722521,
        [...]
        19.684391615057514,
        19.977061530577178,
        19.988629620292677
    ],
    "ti": [
        2.4122296543221995,
        4.123140869956541,
        6.3071108131094515,
        [...]
        33.12504413887535,
        32.760002495288674,
        33.71977135965865
    ],
    "settings": {
        "calculation_domain": "pq",
        "hdr_mode": "sdr",
        "bit_depth": 8,
        "color_range": "full",
        "eotf_function": "bt1886",
        "l_max": 300,
        "l_min": 0.1,
        "gamma": 2.4,
        "pu21_mode": "banding",
        "legacy": false,
        "version": "0.2.1"
    },
    "input_file": "Eldorado2s_1080.mp4"
}

I believe this is the correct solution. I hope you find it useful and that it helps you resolve the issue with resolutions where padding is used.

slhck commented 1 year ago

This is great, thanks! I will check whether these modifications cause issues with the other set of files that I have (or change the values in any way), and then merge it.

pmico commented 1 year ago

Without modifying the 'file.py' file, I have decided to run the same tests on another machine. This machine uses Ubuntu 22.04.2 as the operating system, and it has ffmpeg version 4.4.2 installed (the default version).

When I run the same tests, I encounter the same errors with resolutions 240 and 480 (others resolutions work fine), but the error is different from the one obtained on macOS. In the case of resolution 240, the size of the vector is now 122880 (if we divide it by 240, we get 512), while for resolution 480, the size of the vector is 430080 (dividing it by 480 gives 896).

Therefore, it seems that the formula used for calculating the line size is: linesize = ceil(width / 128) * 128 So for 240 now: linesize = ceil(426/128) * 128 = ceil(3.328125) * 128 = 4 * 128 = 512 And for 480: linesize = ceil(854/128) * 128 = ceil(6.671875) * 128 = 7 * 128 = 896

Therefore, the padding is much larger in Linux with ffmpeg 4.4.2 compared to macOS with ffmpeg 6.0 (although I cannot currently confirm whether it is due to the operating system or the ffmpeg version).

In any case, I believe that the changes in the code of the 'file.py' file will still be valid for any type of padding.

pmico commented 1 year ago

I just found this in the definition of the AVFrame structure within the 'libavutil/frame.h' file that belongs to the FFMpeg code

/**
 * For video, a positive or negative value, which is typically indicating
 * the size in bytes of each picture line, but it can also be:
 * - the negative byte size of lines for vertical flipping
 *   (with data[n] pointing to the end of the data
 * - a positive or negative multiple of the byte size as for accessing
 *   even and odd fields of a frame (possibly flipped)
 *
 * For audio, only linesize[0] may be set. For planar audio, each channel
 * plane must be the same size.
 *
 * For video the linesizes should be multiples of the CPUs alignment
 * preference, this is 16 or 32 for modern desktop CPUs.
 * Some code requires such alignment other code can be slower without
 * correct alignment, for yet other it makes no difference.
 *
 * @note The linesize may be larger than the size of usable data -- there
 * may be extra padding present for performance reasons.
 *
 * @attention In case of video, line size values can be negative to achieve
 * a vertically inverted iteration over image lines.
 */
int linesize[AV_NUM_DATA_POINTERS];

So, it all depends on the CPU of the computer on which FFmpeg is running.

slhck commented 1 year ago

I've adapted your code to respect the bit depth and made a PR here: https://github.com/VQEG/siti-tools/pull/18 — however the tests fail, because the output produces (sometimes only slightly) different values:

FAILED test/test.py::TestSiti::test_siti_main_functions[foreman_cif] - assert inf == 79.4786357183675
FAILED test/test.py::TestSiti::test_siti_calculator[SparksElevator_480x270p_5994_10bit.y4m] - assert 46.31809158129229 ± 4.6e-01 == 45.850332139158695
FAILED test/test.py::TestSiti::test_siti_calculator[SparksTruck_2048x1080_5994_hdr10.y4m] - assert 39.4850842593107 ± 3.9e-01 == 33.725918961062305
FAILED test/test.py::TestSiti::test_siti_calculator[checkerboard-8x8-10bpp-limited.y4m] - assert 211.26997239196237 ± 2.1e+00 == 178.483060502745
FAILED test/test.py::TestSiti::test_siti_calculator[checkerboard-8x8-10bpp.y4m] - assert 211.29068096160657 ± 2.1e+00 == 178.50055531682995
FAILED test/test.py::TestSiti::test_siti_calculator[legacy foreman_cif.y4m] - assert inf == 79.4786357183675
FAILED test/test.py::TestSiti::test_siti_calculator[legacy full-range.y4m] - assert inf == 146.56553042206565
FAILED test/test.py::TestSiti::test_siti_calculator[sunset-hlg.mov] - assert 1.06447588085935 ± 1.1e-02 == 0.7738618954474061
FAILED test/test.py::TestSiti::test_siti_calculator[lights-hlg.mov] - assert 10.384281528964841 ± 1.0e-01 == 8.498430920800487
FAILED test/test.py::TestSiti::test_siti_calculator[fall-hlg.mov] - assert 12.293574040350016 ± 1.2e-01 == 11.60486742434359
slhck commented 1 year ago

To be honest, at this point I would have to dig deeper into the actual luminance plane values with a checkerboard pattern or similar to make sure we are reading the correct values. If so, the test results we defined back then may be incorrect after all because we didn't properly respect the line size.

pmico commented 1 year ago

I'm really sorry that the proposed changes didn't work.

I have decided to take a look at how the data is stored in the buffer of plane[0], and I was stunned by what I saw.

17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 17, 18, 18, 18, 18, 18, 18, 18, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 17, 17, 17, 16, 16, 16, 16, 16, 16, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 16, 17, 17, 17, 17, 17, 17, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

These are the first 1344 (448x3) values of planes[0] when I use the video with 240 resolution, and you can clearly see the padding every 448 values (which is the size of linesize), but what doesn't make sense to me is that there are only 16 zero values when there should have been 448-426=22 zero values.

With the 480 resolution video, it's even worse. No value that could be part of the padding appears as 0.

Edit: I erase wrong information.

slhck commented 1 year ago

I will try this. Not the changes I made in the Pull Request regarding bit depth.

By the way, if you convert to YUV (Y4M), you can at least get the values, because that will be read directly.

pmico commented 1 year ago

Forget what I wrote before.

The order in which 'width' and 'height' are placed is reversed because there are 'height' rows and 'width' columns. So the last code I wrote is incorrect.

As you were noting, there is no choice but to examine the contents of the luminance map in depth.

pmico commented 1 year ago

The error mentioned in #19 is devastating. I would have never imagined seeing an error with the 1920x1080 format. It turns out that 2211840/1080=2048 and the relationship between 1920 and 2048 is ceil(1920/256)x256 = ceil(7.5)x256 = 8x256 = 2048.

I was looking into the ffmpeg code to see how CPU alignment was determined, and I found the file cpu.c inside the libavutil folder. It contains the function av_cpu_max_align, which, as the name suggests, is supposed to return a value for alignment. It depends on the CPU architecture, and for each architecture, there is a folder with its own assembly and C files to perform the necessary calculations and what surprised me was not finding any value higher than 64 returned. But I haven't found the code that allows linesize to be a multiple of those quantities (which would be the only way to achieve 128 or, as in #19, 256).

Then the most incredible thing was finding the following code in the frame.c file (also in libavutil):

static int get_video_buffer(AVFrame *frame, int align)
{
    const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(frame->format);
    int ret, padded_height, total_size;
    int plane_padding = FFMAX(16 + 16/*STRIDE_ALIGN*/, align);
    ptrdiff_t linesizes[4];
    size_t sizes[4];

    if (!desc)
        return AVERROR(EINVAL);

    if ((ret = av_image_check_size(frame->width, frame->height, 0, NULL)) < 0)
        return ret;

    if (!frame->linesize[0]) {
        if (align <= 0)
            align = 32; /* STRIDE_ALIGN. Should be av_cpu_max_align() */

        for (int i = 1; i <= align; i += i) {
            ret = av_image_fill_linesizes(frame->linesize, frame->format,
                                          FFALIGN(frame->width, i));
            if (ret < 0)
                return ret;
            if (!(frame->linesize[0] & (align-1)))
                break;
        }

        for (int i = 0; i < 4 && frame->linesize[i]; i++)
            frame->linesize[i] = FFALIGN(frame->linesize[i], align);
    }
[...]

As you can see, there is a comment that the function I mentioned before needs to be used, but it doesn't do that; it simply sets align=32. Understanding and following the ffmpeg code has become very difficult for me at the moment.

May be the only solution right now is to debug ffmpeg when we rescale the video.

A question that I forgot to ask earlier: Where did you obtain the SI and TI values that you use for the tests?

slhck commented 1 year ago

May be the only solution right now is to debug ffmpeg when we rescale the video.

I'm leaning towards removing support for on-the-fly decoding altogether and instead adding support for yuv4mpeg pipes, so that people can use ffmpeg -f yuv4mpeg - | python3 -m siti_tools - or similar.

A question that I forgot to ask earlier: Where did you obtain the SI and TI values that you use for the tests?

They were generated by the program itself at some point during development to ensure future changes didn't break the values. Since there are no standardized test vectors for SI/TI, and the new algorithms fixed some changes with respect to handling of the pure Y plane (vs. a weighted sum of color components), we couldn't re-use old values. So we just made sure the new values were in a similar range as the old ones.

pmico commented 1 year ago

I have been reviewing the PyAV code on GitHub and came across the following code (belonging to the 'frame.pyx' file in the 'av/video' folder):

cdef useful_array(VideoPlane plane, unsigned int bytes_per_pixel=1, str dtype='uint8'):
    """
    Return the useful part of the VideoPlane as a single dimensional array.

    We are simply discarding any padding which was added for alignment.
    """
    import numpy as np
    cdef size_t total_line_size = abs(plane.line_size)
    cdef size_t useful_line_size = plane.width * bytes_per_pixel
    arr = np.frombuffer(plane, np.uint8)
    if total_line_size != useful_line_size:
        arr = arr.reshape(-1, total_line_size)[:, 0:useful_line_size].reshape(-1)
    return arr.view(np.dtype(dtype))

As stated in the comment, "return the useful part of the VideoPlane as a single dimensional array. We are simply discarding any padding which was added for alignment." As you can see, apart from the last reshape(-1) used to create the single-dimensional array, they are essentially doing the same thing we discussed earlier, so we can be confident that we were on the right track. It might also be worth noting that this function is later used in another function called 'to_ndarray' that considers the frame format (yuv, rgb, etc...) Edit: it's the function you commented in your code in 'file.py'.

I have been reviewing the TI mean values obtained using the old system and the ones obtained using the siti-tools, and the values differ significantly although the relationship between values seems to remain consistent most of the time. Here is the average value obtained for the first five 2-second chunks of the Eldorado.mov video:

Chunk  siti siti-tools
0 50.384 18.059
1 83.247 37.318
2 17.309 20.477
3 5.987 1.813
4 5.699 1.442

It's not surprising that they differ because the calculation method for SI and TI values has changed significantly in the recommendation. However, I am unable to explain the extent to which these changes have affected the results.

Now, how important it could be to obtain standardized test vectors for SI/TI.