Closed cazou closed 1 year ago
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
The segfault is likely a driver fault, however it's possibly linked to bad input being issued from cros-codecs. Hard to say without a backtrace and other ways to reproduce.
If this is run against a debug version of the driver with tracing enabled we can probably identify the cause. For https://github.com/intel/media-driver they have the MOS_TRACE macro and friends that can be enabled with a compile flag.
@dwlsalmeida I used the acer-cp514-2h-1160g7-volteer
device type on lava staging. Maybe there are better options with more stable drivers ?
@cazou without a stacktrace it's hard to tell :/
I know very little about CI, but can't you ask it to run ccdec with valgrind or something? If I can get any insight into what part of the code is causing this I can fix it.
Also, it's very weird that absolutely nothing in VP9 is passing, definitely something is up.
Summarizing what I know about the segfaults, taking vp90-2-02-size-lf-1920x1080.webm
for example:
The source stride and the destination stride do not match for Tiger Lake:
#1 0x00007efe0b81fd38 in DdiMedia_CopyPlane (dst=0x563eff3510d0 "", dstPitch=960, src=0x563eff71ece0 "", srcPitch=1920, height=540) at /home/detlev/Sources/goo0028/media-driver/media_driver/linux/common/ddi/media_libva.cpp:5173
This causes a segfault here in the driver, when copying the V plane.
static void DdiMedia_CopyPlane(
uint8_t *dst,
uint32_t dstPitch,
uint8_t *src,
uint32_t srcPitch,
uint32_t height)
{
uint32_t rowSize = std::min(dstPitch, srcPitch);
for (int y = 0; y < height; y += 1)
{
memcpy(dst, src, rowSize);
dst += dstPitch;
src += srcPitch;
}
}
Running this on my Ice Lake machine, I get srcPitch==dstPitch==960
.
I have no idea why this returns 1920 for the V plane on Tiger Lake for presumably i420 content:
DdiMedia_GetChromaPitchHeight(DdiMedia_MediaFormatToOsFormat(surface->format), surface->iPitch, surface->iHeight, &chromaPitch, &chromaHeight);
My hunch is that, somehow, this line gets executed:
*chromaPitch = pitch;
Instead of
*chromaPitch = MOS_ALIGN_CEIL(pitch, 2) / 2;
In
static VAStatus DdiMedia_GetChromaPitchHeight(
uint32_t fourcc,
uint32_t pitch,
uint32_t height,
uint32_t *chromaPitch,
uint32_t *chromaHeight)
{
DDI_CHK_NULL(chromaPitch, "nullptr chromaPitch", VA_STATUS_ERROR_INVALID_PARAMETER);
DDI_CHK_NULL(chromaHeight, "nullptr chromaHeight", VA_STATUS_ERROR_INVALID_PARAMETER);
switch(fourcc)
{
case VA_FOURCC_NV12:
case VA_FOURCC_P010:
case VA_FOURCC_P012:
case VA_FOURCC_P016:
*chromaHeight = MOS_ALIGN_CEIL(height, 2) / 2;
*chromaPitch = pitch;
break;
case VA_FOURCC_I420:
case VA_FOURCC_YV12:
*chromaHeight = MOS_ALIGN_CEIL(height, 2) / 2;
*chromaPitch = MOS_ALIGN_CEIL(pitch, 2) / 2;
break;
case VA_FOURCC_411P:
case VA_FOURCC_422H:
case VA_FOURCC_444P:
case VA_FOURCC_RGBP:
*chromaHeight = height;
*chromaPitch = pitch;
break;
case VA_FOURCC_422V:
case VA_FOURCC_IMC3:
*chromaHeight = MOS_ALIGN_CEIL(height, 2) / 2;
*chromaPitch = pitch;
break;
default:
*chromaPitch = 0;
*chromaHeight = 0;
}
return VA_STATUS_SUCCESS;
}
Notably, we stopped indicating the fourcc when creating the surfaces some time ago:
/// Create new surfaces and add them to the pool, using `descriptors` as backing memory.
pub(crate) fn add_surfaces(&mut self, descriptors: Vec<M>) -> Result<(), VaError> {
let surfaces = self.display.create_surfaces(
self.rt_format,
// Let the hardware decide the best internal format - we will get the desired fourcc
// when creating the image.
None, <-----------
self.coded_resolution.width,
self.coded_resolution.height,
self.usage_hint,
descriptors,
)?;
It may very well be that the bug is in between the surface's internal format and the memcpy
.
@cazou can you confirm the VA-API version? (vainfo
)
Thanks for this, this is looking good!
I think one feature we will need is the ability to keep track of which tests are expected to fail, since we cannot reach 100% due to bugs in drivers and sometimes cros-codecs itself. So the CI job should be successful as long as no test regressed.
Running this on my Ice Lake machine, I get
srcPitch==dstPitch==960
.I have no idea why this returns 1920 for the V plane on Tiger Lake for presumably i420 content:
@dwlsalmeida To be fully correct, this happens on debian on both hp-x360-12b-ca0500na-n4000-octopus and acer-cp514-2h-1160g7-volteer device types on lava. I was not able to reproduce this on my laptop running Gentoo (media-driver 23.2.4). But I was able to reproduce this on my laptop within a docker running debian.
@cazou can you confirm the VA-API version? (
vainfo
)
Gentoo:
libva info: VA-API version 1.18.0
libva info: Trying to open /usr/lib64/va/drivers/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_18
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.18 (libva 2.18.2)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 23.2.4 ()
Debian:
libva info: VA-API version 1.17.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_17
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.17 (libva 2.12.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 23.1.1 ()
I edited the debian info, it was with one of my debug version, now it is the one from the debian package
CPU is 11th gen Intel i7-1185G7
It looks like an issue with the debian package. I tried on bullseye (oldstable), bookworm (stable) and trixe (unstable) and all have the same issue. But when I rebuild the driver from the git repo at the same version, it works correctly (no segfault). I will have a closer look at this debian package
I confirm that this is an issue with the way the package is built: Debian separates non-free libs from the other ones, so the intel-media-va-driver
package is built with the -DENABLE_NONFREE_KERNELS=OFF
flag. Apparently, with that flag, there is a bug that makes ccdec segfault.
A simple fix is to install intel-media-va-driver-non-free
instead from the non-free
apt source. We need to evaluate if that package can be installed on a rootfs that will be made publicly available.
With that package, the segfault issue is not a blocker anymore to continue implementing the CI.
FWIW, on Arch, we all get the "non-free" stuff by default -> https://gitlab.archlinux.org/archlinux/packaging/packages/intel-media-driver/-/blob/main/PKGBUILD
@cazou, before we can merge this we will need you to fix the CLA situation - could you check https://github.com/chromeos/cros-codecs/pull/48/checks?check_run_id=16226713377 ?
@Gnurou @dwlsalmeida The failures are due to the missing action secrets on this project. We need to add, as secrets:
GH_BEARER
: A Github token with read access to the Actions artifacts (This will be publicly visible on LAVA)LAVA_TOKEN
: A LAVA (lava.collabora.dev) token to be able to submit jobs
And a variable:LAVA_USERNAME
: The user name associated with the LAVA tokenThe failures are due to the missing action secrets on this project.
Ok, so can we prevent the LAVA tests from being run in the Github actions for now? We will enable them with the CL that adds the necessary tokens.
Besides the submit jobs seem to be failing for unrelated reasons:
Error: Unable to resolve action `dorny/test-reporter@v1.7`, unable to find version `v1.7`
The failures are due to the missing action secrets on this project.
Ok, so can we prevent the LAVA tests from being run in the Github actions for now? We will enable them with the CL that adds the necessary tokens.
They can, yes. But that's basically all there is in this PR. Can't we just merge this when the secrets are set ?
Besides the submit jobs seem to be failing for unrelated reasons:
Error: Unable to resolve action `dorny/test-reporter@v1.7`, unable to find version `v1.7`
Ha yes, I'll fix it. Upstream uses tags instead of a branch, so the whole version number is needed.
Hi @cazou,
They can, yes. But that's basically all there is in this PR. Can't we just merge this when the secrets are set ?
I'm fine with that but have to admit I am a bit oblivious as to what secrets we need to obtain, how, and where to set them (as they are "secrets" I suppose we shall not push them into this repository). Could you enlighten me a bit about that?
Hi @cazou,
They can, yes. But that's basically all there is in this PR. Can't we just merge this when the secrets are set ?
I'm fine with that but have to admit I am a bit oblivious as to what secrets we need to obtain, how, and where to set them (as they are "secrets" I suppose we shall not push them into this repository). Could you enlighten me a bit about that?
Hi @Gnurou !
We need to set those secrets:
GH_BEARER
: A Github token with read access to the Actions artifacts (This will be publicly visible on LAVA)
LAVA_TOKEN
: A LAVA (lava.collabora.dev) token to be able to submit jobs
I am currently in the process of getting a token from our LAVA people and we'll be able to give that to you for this repo.
It will be stored on this repository as a secret. Only you can set or edit it and nobody can read it. it will be shown as ***
in the logs (see https://github.com/cazou/cros-codecs/actions/runs/6510164313/job/17683262845#step:3:1)
The GH_BEARER secret must be a token that you generate for this repository, with the read permission. It will be used by LAVA to recover the built ccdec artifact to be tested.
They can be set in Settings > Secrets and Variables > Actions. There is a Add new repository secret
button.
And a variable LAVA_USERNAME
: The user name associated with the LAVA token. We'll provide that as well. Doesn't need to be a secret and can be added at the same place as Action Secrets, in the variables tab.
@cazou Can you also make the LAVA_RPC2_URL
(e.g. https://lava.collabora.dev/RPC2
) a variable (non-secret), so that forks could use their own Lava instance ?
Updated the commit as https://github.com/kernelci/kernelci-core/pull/2051 was merged. It now uses the kernelci amd64 bullseye-gst-fluster rootfs
Attempting a merge now that the secrets are in - fingers crossed...
cros-codecs CI workflow:
The
ci/config.yaml
contains the test configurations. For each architecture (amd/intel), it specifies:vp8
,h.264
, ...)The github action workflow uses
ci/config.yaml
to get the lavadevice_type
to run the test on. The test on lava is run fromci/test-cases/run_tests.py
and usesci/config.yaml
to generate the fluster command to run.About the lava test job