Open jellyfish-bot opened 2 years ago
@acostach sorry for all the pings, but this week I'm trying to go through each of the devices and see whats blocking them -
I've just tried to run tests on the rockpi testbot, and the DUT supposedly isn't powering on:
Sep 13 08:20:13 2a42142 f2f7283b67c2[1604]: Booting DUT with the balenaOS flasher image
Sep 13 08:20:13 2a42142 f2f7283b67c2[1604]: waiting for DUT to be on
Sep 13 08:20:13 2a42142 f2f7283b67c2[1604]:
Sep 13 08:20:13 2a42142 f2f7283b67c2[1604]: DUT is currently Off
Sep 13 08:20:18 2a42142 f2f7283b67c2[1604]: waiting for DUT to be on
Sep 13 08:20:18 2a42142 f2f7283b67c2[1604]:
Sep 13 08:20:18 2a42142 f2f7283b67c2[1604]: DUT is currently Off
Sep 13 08:20:23 2a42142 f2f7283b67c2[1604]: waiting for DUT to be on
Sep 13 08:20:23 2a42142 f2f7283b67c2[1604]:
Sep 13 08:20:23 2a42142 f2f7283b67c2[1604]: DUT is currently Off
Sep 13 08:20:28 2a42142 f2f7283b67c2[1604]: waiting for DUT to be on
Sep 13 08:20:28 2a42142 f2f7283b67c2[1604]:
Sep 13 08:20:28 2a42142 f2f7283b67c2[1604]: DUT is currently Off
Sep 13 08:20:33 2a42142 f2f7283b67c2[1604]: waiting for DUT to be on
This is either because:
[acostach] @rcooke-warwick checking now, the device has the green light on, which means it's powered, but the ethernet LEDs are off, so it's not booting. could be due to the mux/sd-card. Can I remove it from the testbot and try boot it with the already flashed sd-card in the mux?
@rcooke-warwick looks like it might be because of the voltage, I will make a PR to increase it from 5 to 12. On Radxa website they say that it works with 5V but may cause stability issues once the load rises, and this is what I see locally now, it powers off during flashing with 5V but not with 12V.
nice find @acostach sounds like a good idea to increase it
[acostach] done, I will merge https://github.com/balena-io-hardware/testbotsdk/pull/48 once checks pass and then update leviathan worker, after that we can test again.
@rcooke-warwick I updated the testbotsdk to increase the voltage and also leviathan-worker, where I merged https://github.com/balena-os/leviathan-worker/pull/26
[acostach] 1) Is answered, the rig app updated already
@acostach nice one, the rig did update a while back and I retried the rockpi job, its flashing at the moment, will report back here with the result
@acostach update, unfortunately it hasn't worked - 12v is coming out of the tesbot but we're still getting: Sep 13 11:03:40 2a42142 eda46eb94756[1604]: DUT is currently Off Sep 13 11:03:45 2a42142 eda46eb94756[1604]: waiting for DUT to be on
[rcooke-warwick] now I'm wondering if there's something going wrong with the detection of if the DUT is on/off ...
[rcooke-warwick] does the rockpi have ethernet?
[rcooke-warwick] (I remember the rockpi flashing has worked before, but maybe I'm remembering wrong)
@rcooke-warwick I plugged and unplugged the cable, it should flash the device once again
[acostach] I recall we did run tests on this DT before and they were running, provisioning worked
[acostach] it's not powering off, is the test running normally?
[rcooke-warwick] which cable did you unplug/replug?
[acostach] ethernet and usbc
usb-c is the power cable
[rcooke-warwick] hmm its just staying "on" now
@acostach yep, the device names cross wires - I did realize and eventually created and linked a new ticket. Sorry for the noise.
want me to plug and unplug the power cable @rcooke-warwick ? That would trigger the re-flashing IF the sd-card is switched to DUT
I already turned the DUT off then on again to try to achieve that @acostach
[rcooke-warwick] the DUT remained to stay on forever - so for some reason it isn't internally flashing the DUT
[rcooke-warwick] the device is currently in this state, the test job is still running
[rcooke-warwick] retrying with fresh slate
[acostach] ok. if it still doesn't work let me know and I'll hook up the serial cable from my PC to the device and kick the suite, see where it hangs
@acostach rockpi seems to be flashing now: https://jenkins.product-os.io/job/leviathan-v2-template/4673/console - I've flashed it 3 times from a local test job in a row, and now this jenkins one is running - maybe there was just something loose that got fixed when you unplugged and replugged
very possible @rcooke-warwick , good thing it's working now, thanks for letting me know as I was just going to connect the serial and restart it
[rcooke-warwick] @acostach it has been consistently flashing every time last night and this morning. Now we move on to the problem of tests failing. First roadblock is the test here: https://github.com/balena-os/meta-balena/blob/master/tests/suites/os/tests/chrony/index.js#L157
Which has failed both times I've tried it. This test I;m not that familiar with, but here's what I get from it:
date --set="-2min
is used to skew the time on the device. The test then parses the logs for NTP time lost synchronization - restarting chronyd
- however this never appears in the logs, so the test failsjust running it again now to get the journal logs to see why that might be happening.
[rcooke-warwick] furstratingly, that test has now passed...
I think it is linked to this issue: https://github.com/balena-os/meta-balena/issues/2758
From what I've seen, in the case of failure, chronyc is started with some sort of wrong permissions:
Sep 14 08:08:44 b0105db healthdog[5665]: 2022-09-14T08:08:44Z Wrong permissions on /run/chrony
Sep 14 08:08:44 b0105db healthdog[5665]: 2022-09-14T08:08:44Z Disabled command socket /run/chrony/chronyd.sock
Sep 14 08:08:44 b0105db healthdog[5665]: 2022-09-14T08:08:44Z Running with root privileges
Sep 14 08:08:44 b0105db healthdog[5665]: 2022-09-14T08:08:44Z Frequency 0.000 +/- 1000000.000 ppm read from /var/lib/chrony/drift
Sep 14 08:08:44 b0105db healthdog[5668]: [chrony-healthcheck][INFO] No online NTP sources - forcing poll
Sep 14 08:08:44 b0105db healthdog[5668]: [chrony-healthcheck][ERROR] Failed to trigger NTP sync
In the case of the test passing, I never see that message about a disabled command socket
cc @alexgg @jakogut
[rcooke-warwick] on a side note, does anyone know if the rockpi led flashing works? I saw this issue: https://github.com/balena-os/balena-radxa/issues/10 and also I checked supervisor.conf
on the rockpi and it has LED_FILE=/dev/null
#
@rcooke-warwick the LED is not implemented for the radxa-zero nor rockpi4b so this test can be skipped
^ @acostach @floion added this finding to this issue ^ https://github.com/balena-os/balena-radxa/issues/10 -- this is currently causing the rockpi4b to fail the OS test suite. Does this device have an LED? The contract says it does
@rcooke-warwick do we have a mechanism to mark a test as not mandatory on a per DT basis?
@acostach the test runs because in the contract for the rockpi , it is set to LED: true - should I make the PR for the contract to set this to false
?
@rcooke-warwick yes, let's set it to false. Related thread https://jel.ly.fish/issue-release-rockpi-4b-rk3399-production-automated-tests-6594ff9
I pushed the PR and it should merge soon @rcooke-warwick https://github.com/balena-io/contracts/pull/326
done, it's merged in the contracts @rcooke-warwick
thanks @acostach I'll now bump contracts in leviathan - which will autobump in meta balena--- eventually it will reach the rockpi repo ;P
@acostach @alexgg looks like rockpi4b can now pass the entire test suite: https://jenkins.product-os.io/job/leviathan-v2-template/5140/
Although it looks like the tests won't run on balena-radxa
PR's. I'll fix that and then technically we can use Alexes workflow to autodeploy for rockpi if tests pass
[rcooke-warwick] I can see you've added the rockpro64 into the rig - does this device successfully flash with the testbot?
[rcooke-warwick] was balena-radxa
called balena-rockpi
until recently?
@rcooke-warwick yes, looks like it was renamed from balena-rockpi to balena-radxa Regarding the rockpro64, it was added to the rig but it didn't get to the flashing step yet, the leviathan job stops during initialization https://jenkins.product-os.io/job/leviathan-v2-template/5148/console
Some more ethernet switches and cables have been ordered and are on their way here, currently the the RockPro64 in the rig is not connected via ethernet.
[klutchell] undefined