Open tmigone opened 3 years ago
[tmigone] This issue has attached support thread https://jel.ly.fish/7632a307-64d9-402c-bda7-1f3a1ab0a474
@pipex I looked at the ticket expecting to find a comment saying "the png was invalid" or something. Didn't we have an issue where the user specified image was not being detected correctly by the Supervisor as a PNG because the first couple of bits were not what we expected to be considered a png ?
Can't say I remember that @20k-ultra
Wasn't there a device where the image user was corrupted or something ? It seems pretty vivid that someone's device had an issue with splash images and you resolved it. I think me commenting on that here was a mistake since the issue doesn't involve a corrupted image but somehow thought it was related. Feel free to ignore this comment if I'm still incorrect.
[rcooke-warwick] This issue has attached support thread https://jel.ly.fish/bad5aa85-a60f-486d-a593-f753f04f6312
[lmbarros] This issue has attached support thread https://jel.ly.fish/71ce2050-5751-4f2b-90de-fc4b5faed56e
Just looked at one instance of this issue and noticed that the value of Define the PNG image to be used was
data:image/png;name=p.png;base64,iVBORw0KGgoAAAA...
. IIUC, this should contain only the base64-enconded PNG, without that data:image/png;name=p.png;base64,
prefix, right?
And BTW, this matches the logs, as shown in Tomás' initial report, which also shows this unexpected(?) prefix.
Managed to replicate this issue. It will happen if the balena-logo-default.png
has the same value as balena-logo.png
and the BALENA_HOST_SPLASH_IMAGE
variable.
The issue is here https://github.com/balena-os/balena-supervisor/blob/8750951521ee3b945e3eb5f5fe10c467cb97adf8/src/config/backends/splash-image.ts#L108-L112
the code above will only identify the splash image value if the default is not the same as balena-logo, to avoid uploading the balena logo for every new device that provisions, while the intention was good, it has the side effect described on this ticket.
The solution is removing the if and always returning the {image: '...'}
Actually is not that easy, because most existing devices do not have a splash, that means when updating to a supervisor with splash support, that would get the logo deleted. To prevent that I believe we added the above, but it has some bad side effects. Need to think about the right way to solve this
~~Saw a variation of this where the fleet did not have a splash image defined, but the device had an override while not having any image defined ~~
Ignore the above, mistook the RPi splash image for our boot splash image, they're different configs.
unable to reproduce this on a NUC VM, so if the issue still exists and hasn't been inadvertently patched in newer SV versions, it's most likely a race condition. Will try on RPi next
Problem It appears that the supervisor enters into a loop applying boot config in the following scenario:
This prevents the device from getting any updates as the supervisor is stuck trying to apply a state and never settles, looping with this message:
To replicate
Removing the fleet default splash image fixes the problem.