96boards / oe-rpb-manifest

RPB development environment setup using Android repo tool
MIT License
17 stars 58 forks source link

[db410c] failing to mount rootfs #141

Open jwinarske opened 3 years ago

jwinarske commented 3 years ago

I’m building dunfell for db410c. I have 20+ units in a board farm.

db410c Target images

rpb-console-image
rpb-console-image-test
rpb-weston-image
rpb-weston-image-test

If I build without changing local.conf in any way the resultant image boots and all is well.

When I make the changes below I'm finding rootfs fails to mount.

I use the same flashing procedure for either scenario.

Any ideas?

Console image changes

echo -e 'DISTRO_FEATURES_remove = "x11"\n' >> conf/local.conf
echo -e 'DISTRO_FEATURES_append = " opengl"\n' >> conf/local.conf
echo -e 'IMAGE_INSTALL_append = " \' >> conf/local.conf
echo -e '  i2c-tools can-utils \' >> conf/local.conf
echo -e '"\n' >> conf/local.conf

Weston image changes

echo -e 'DISTRO_FEATURES_remove = "x11"\n' >> conf/local.conf
echo -e 'DISTRO_FEATURES_append = " opengl"\n' >> conf/local.conf
echo -e 'IMAGE_INSTALL_append = " \' >> conf/local.conf
echo -e '  i2c-tools can-utils \' >> conf/local.conf
echo -e '  adwaita-icon-theme-cursors \' >> conf/local.conf
echo -e '  xdg-user-dirs \' >> conf/local.conf
echo -e '"\n' >> conf/local.conf
ndechesne commented 3 years ago

Nothing obvious comes to mind. What's DISTRO in your build? Do you have a complete boot log? can you share the buildhistory changes as well? e.g. first build, then make your changes, and rebuild and check buildhistory, it will indicate all the changes in the images

jwinarske commented 3 years ago

I isolated it to the AWS cloud build. Local build artifacts works just fine, cloud artifacts do not. Same build steps in either case. Lovely.

jwinarske commented 3 years ago

I'm finding when I flash the pipeline built image using EDL/QDL it mounts rootfs. The same pipeline image flashed via fastboot exhibits the problem. If I flash the target via fastboot from local build it mounts rootfs fine.

The additional pipeline steps are:

  1. tar.gz's the artifact folder: tar -czvf $PACKAGE_FILE ${ARTIFACT_DIR}/
  2. upload to gitlab package store: curl --retry 5 --retry-delay 10 -L -H "Job-Token: ${CI_JOB_TOKEN}" --upload-file $PACKAGE_FILE "${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/${PACKAGE}/0.1.${CI_JOB_ID}/${PACKAGE_FILE}"
  3. download and extract

When I have time I'll run an md5 of the files in question. In the meanwhile I'll use EDL to flash.

ndechesne commented 3 years ago

hmm. it is an 'interesting' issue ;)

can you share the XML snippet you use to flash the root file system with QDL? Are you building and splitting the sparse image, or doing a 'raw' copy of the entire file?

Which exact file in the deploy_dir folder are you flashing? the rootfs.ext4 file?

I suspect the issue is with the generation of the ext4 image. Would be good to mount both the working and non working ext4 images and byte compare their content first of course. What you could try as well, is mount the 'pipeline' image on your local machine, extract the rootfs content, and recreate locally an ext4 image and try to flash. That would confirm my suspicion that the problem lies in the generation of the ext4 image in your Jenkins instance.

Which kernel/distro are you running on both machines (local and Jenkins)?

jwinarske commented 3 years ago

can you share the XML snippet you use to flash the root file system with QDL? Are you building and splitting the sparse image, or doing a 'raw' copy of the entire file?

gen_flat_build_emmc.sh

Run script as part of pipeline build.

Extract flat_build_emmc.tar.gz, and cd into it.

Transition device to EDL mode, and execute

sudo qdl --storage emmc prog_emmc_firehose_8916.mbn rawprogram0.xml patch0.xml

Transition out of EDL mode, and power cycle

In the case of sparse vs not. Running simg2img on the ext4 fails, stating it's not a sparse file. The default for rootfs in rawprogram0.xml is sparse="false". So this is correct.

Which exact file in the deploy_dir folder are you flashing? the rootfs.ext4 file?

See above sequence

I suspect the issue is with the generation of the ext4 image. Would be good to mount both the working and non working ext4 images and byte compare their content first of course. What you could try as well, is mount the 'pipeline' image on your local machine, extract the rootfs content, and recreate locally an ext4 image and try to flash. That would confirm my suspicion that the problem lies in the generation of the ext4 image in your Jenkins instance.

I'll give that a go.

Which kernel/distro are you running on both machines (local and Jenkins)?

Identical source. Dunfell rpb/rpb-weston

Difference between builds is host environment. Pipeline container = Ubuntu 18.04 local host = Fedora 33

Perhaps ext4 generation happens using host tools vs "native" tools.