Linaro / lite-lava-docker-compose

LITE Team LAVA docker dispatcher
MIT License
5 stars 10 forks source link

[DNM] Nucleo l552ze support #103

Closed erwango closed 4 years ago

erwango commented 4 years ago

This is my whole branch with:

erwango commented 4 years ago

^^@pfalcon

pfalcon commented 4 years ago

@erwango: Initial steps of investigation:

Go to http://localhost/scheduler/device/nucleo-l552ze-q-01 (from http://localhost/scheduler/alldevices). The error issued by LAVA is pretty clear:

Configuration Error: missing or invalid template. Jobs requesting this device type (nucleo-l552ze-q) will not be able to start until a template is available on the master.

That's because, unlike most of other device type we work with, nucleo-l552ze-q is not a builtin device type. That means that a jinja template for it must be set explicitly (for builtin device types, templates are shipped with LAVA). We have an example on how to do that in the Makefile: https://github.com/Linaro/lite-lava-docker-compose/blob/lite/Makefile#L111 .

Summing up, following line is required:

--- a/Makefile
+++ b/Makefile
@@ -131,6 +131,7 @@ lava-boards:
        lavacli -i $(LAVA_IDENTITY) devices tags add disco-l475-iot1-01 zephyr-net

        -lavacli -i $(LAVA_IDENTITY) device-types add nucleo-l552ze-q
+       lavacli -i $(LAVA_IDENTITY) device-types template set nucleo-l552ze-q device-types/nucleo-l552ze-q.jinja2
        -lavacli -i $(LAVA_IDENTITY) devices add --type nucleo-l552ze-q --worker lava-dispatcher nucleo-l552ze-q-01
        -lavacli -i $(LAVA_IDENTITY) devices dict set nucleo-l552ze-q-01 devices/nucleo-l552ze-q-01.jinja2
        lavacli -i $(LAVA_IDENTITY) devices tags add nucleo-l552ze-q-01 zephyr-net

Sadly, that's no enough - the device is still in "Bad" state. This time, there doesn't seem to be any helpful hints from LAVA, so will need to patch code to see what's wrong.

erwango commented 4 years ago

@pfalcon thanks for the hint, I had clicked on lot of links, but missed that one.

Will have a new try with this line (at least to be on par with your investigation).

pfalcon commented 4 years ago

So, debugging this was rather tough and frustrating, because of LAVA codebase explicitly swallowing relatted exception on multiple levels. I submitted on this https://git.lavasoftware.org/lava/lava/-/issues/430. And as I'm not sure that gets enough attention, I'm keep to prototype required changes myself, while it's fresh.

Even more frustrating is that it turned out to be a known issue, waiting in queue for half a year: https://git.lavasoftware.org/lava/pkg/docker-compose/-/issues/4 .

So, the cause of the issue is following: the docker setup we use (as inherited from upstream) uses too many individual containers for various parts of LAVA system, instead of putting related parts in one container. And when lavacli -i device-types template set pushes a template for custom device type into one container (lava-server?) but not another (lava-master?). Then code running in lava-master doesn't find that template, and due to extremely bad error reporting as described above, we get the behavior what we get: the device health is reset to "Bad (invalid configuration)" without any further details/logging, while it all looks well in the web UI (the config is there and well-looking).

I'll be working on the interim workaround for this (after 2020.07 upgrade).

erwango commented 4 years ago

Thanks @pfalcon for this investifation

pfalcon commented 4 years ago

@erwango, with https://github.com/Linaro/lite-lava-docker-compose/pull/105 in mind, following changes fix nucleo-l552ze-q setup for me:

--- a/Makefile
+++ b/Makefile
@@ -136,6 +136,9 @@ lava-boards:
        lavacli -i $(LAVA_IDENTITY) devices tags add disco-l475-iot1-01 zephyr-net

        -lavacli -i $(LAVA_IDENTITY) device-types add nucleo-l552ze-q
+       lavacli -i $(LAVA_IDENTITY) device-types template set nucleo-l552ze-q device-types/nucleo-l552ze-q.jinja2
+       # Workaround, see comment above.
+       docker cp device-types/nucleo-l552ze-q.jinja2 lava-master:/etc/lava-server/dispatcher-config/device-types/
        -lavacli -i $(LAVA_IDENTITY) devices add --type nucleo-l552ze-q --worker lava-dispatcher nucleo-l552ze-q-01
        -lavacli -i $(LAVA_IDENTITY) devices dict set nucleo-l552ze-q-01 devices/nucleo-l552ze-q-01.jinja2
        lavacli -i $(LAVA_IDENTITY) devices tags add nucleo-l552ze-q-01 zephyr-net

With it, device is created with normal status, and I can submit your lava-nucleo_l552ze_q.job job, and it starts running (of course, it fails soon, as it cannot find the actual board).