End device onboarding flow

johanstokking commented 2 years ago

Summary

New end device onboarding flow

Replaces #3770 Blocked by #4840 Blocked by #4841 Blocked by #4845

Why do we need this?

To make it even easier to onboard new end devices by scanning QR codes.

Also we need to remove the creation on an external Join Server and integrate device claiming in the onboarding process.

For most end users, device creation and claiming is conceptually the same. We should put this in one nice device onboarding experience.

What is already there? What do you see now?

Device creation via Device Repository and manual
Device claiming by importing a manifest
QR code scanning app (TTSE only)

What is missing? What do you want to see?

Onboarding flow with QR code scanning, claiming, manual creation and retrieving info from the Device Repository integrated.

How do you propose to implement this?

For onboarding:

We have the following properties in an end device onboarding state:

JoinEUI (to check with DCS if claiming is supported – #4841)
DevEUI
Vendor and Profile ID
LoRaWAN device profile (includes activation mode)
Brand, Model, Hardware and Firmware Version and Region (lookup end device model through Device Repository)
Claim Authentication Code

Allow user to choose between: scan QR code, choose from device repository and manual creation:

Ask the user whether they want to scan a QR code. If yes, go to step 1, if not, go to step 2.
Scan QR code: this contains at least the JoinEUI and DevEUI, potentially also the numeric Vendor and Profile ID and the Claim Authentication Code. Call the QRCodeParser.Parse() rpc (https://github.com/TheThingsNetwork/lorawan-stack/issues/4845)
1. Put the JoinEUI and DevEUI in the onboarding state
2. If the QR code contains a claim authentication code, put this in the onboarding state
3. If the QR code contains a non-zero Vendor ID and Profile ID, lookup the LoRaWAN device profile (#4842) and put this in the onboarding state. Otherwise, the LoRaWAN device profile is empty, but the activation mode can be preset to OTAA (ABP devices won't have QR codes)
4. Go to step 2
Choose between selecting from the Device Repository or manual creation
1. Choose from Device Repository. This gives us the Brand and Model ID, Hardware and Firmware Version and Region
  1. Get the LoRaWAN device profile from the Device Repository and put this in the onboarding state
2. Manual creation, this basically proceeds to step 3 with an empty LoRaWAN device profile
Show all known information: activation mode, JoinEUI, DevEUI and fill out LoRaWAN device profile
1. If there is no information (the user came here through Manual creation), the user has to select the activation mode still
2. If the activation mode is OTAA
  1. The JoinEUI can be prefilled from step 1, entered manually or the user can take the default JoinEUI (#4840) (no more 00!)
  2. As soon as any JoinEUI is filled in, contact DCS to see if claiming is supported for that JoinEUI
    1. If claiming is supported, prefill the claim authentication code from step 1. If step 1 was skipped, the claim authentication code is empty and can be entered by the user. In either case, do not ask for the root keys
    2. If claiming is not supported, do ask for the root keys
3. Show the rest of the form like we do now; LoRaWAN versions, frequency plan, ABP settings etc etc
Create does the following:
1. Create on IS: this is important to make sure the end device identifiers are unique
2. Create on NS and AS: mostly for validation
3. Is there claim authentication code in the onboarding state?
  1. If yes, claim the end device on DCS. Do not set the join_server_address
  2. If not, create the end device on the cluster-local JS with the root keys, do set the join_server_address
4. Any failure leads to a rollback

For offboarding

Check if the join_server_address is set

If set, delete from JS
If unset, unclaim the end device from DCS

How do you propose to test this?

Let's test the flows first. I'm not sure what the best of doing that is; using mock ups?

These are key scenarios we need to support:

Generic Node with QR code. Contains claim authentication code. JoinEUI, DevEUI and brand are known. The JoinEUI supports claiming. The user only needs to select Generic Node as device model, the versions, region and frequency plan
Generic Node without QR code, onboarding by entering JoinEUI and DevEUI. Select brand, model, versions and region from Device Repository. This should detect that claiming is supported and should ask for a claim authentication code.
Any device from the Device Repository that does not support claiming; this should ask for root keys and only allow creating the device in the cluster-JS
Manual creation of OTAA, ABP and multicast device should still work as expected

Can you do this yourself and submit a Pull Request?

Can review

KrishnaIyer commented 2 years ago

Now that https://github.com/TheThingsNetwork/lorawan-stack/pull/5324 is merged, here's a short summary of the backend.

Claiming/Unclaiming (Primary flow)

The Device Claiming Server is the client for various Join Servers. Infrastructure will be updated to support having a single DCS for each network.
The DCS now supports generic Claim, Unclaim, GetClaimStatus and GetInfoByJoinEUI. Check the DeviceClaimingServer service for details.
Join Servers are configured with device credentials for a collection of JoinEUIs.
Clients of TTS (Console/CLI) can check if a JoinEUI is claimable using GetInfoByJoinEUI.
- If this returns true, use the Claim RPC to claim the device.
- If it is false, register the device in the cluster Join server
The clients must register the device in the NS, AS and IS as usual.
When claiming on an external JS,
- The users need to set the Claim Authentication Code value with a token that is the proof of ownership of the device on the external JS.
- Make sure to remove the join_server_address field when registering in the IS.
When the device is deleted, check if the join_server_address is set for the device and if not, use the Unclaim RPC to remove the claim from the external JS.

Getting Identifiers from a QR Code

When https://github.com/TheThingsNetwork/lorawan-stack/pull/5134 gets merged, we will have support to fetch the DevEUI, JoinEUI and the Claim Authentication Code from a QR code.
Clients can use this JoinEUI to query if it can be claimed and proceed as described above.
QR codes may also contain IDs to identify the LoRaWAN Device profile. When https://github.com/TheThingsNetwork/lorawan-stack/pull/5323 gets merged, you can pass the numeric IDs (VendorID, VendorProfileID) that's extracted from the QR code to get an EndDeviceTemplate with the LoRaWAN device profile info filled.
Clients can use this template to further request necessary fields from the user and register/claim the device as needed.

kschiffer commented 2 years ago

So following our meeting just now, we figured out that it is not actually possible to determine the device model, versions and region from the QR code scan since it will only give us the device profile info and brand ID, which can be valid for multiple combinations.

So there are two things to do here:

Look into extending the information that can be obtained from the QR code to include some kind of device repository identifier @johanstokking
Change the UX to still have users select the relevant model, versions, and frequency profile when scanning a QR code. For this, the possible combinations would ideally be narrowed to the ones that match the fetched profile. In order to implement that, a new RPC in the DR service is required which would return such combinations based on the specified profile. @johanstokking @KrishnaIyer

I will work on modifying the wireframes accordingly.

johanstokking commented 2 years ago

Look into extending the information that can be obtained from the QR code to include some kind of device repository identifier @johanstokking

Change the UX to still have users select the relevant model, versions, and frequency profile when scanning a QR code. For this, the possible combinations would ideally be narrowed to the ones that match the fetched profile. In order to implement that, a new RPC in the DR service is required which would return such combinations based on the specified profile. @johanstokking @KrishnaIyer

For background: currently, the QR code contains vendor ID and profile ID, and there's gonna be a codec ID. That might be useful, but that does not provide the version identifiers which is useful for stats and display.

So ideally, the QR code tells us not only the vendor ID, but the model ID, hardware and firmware version and band. We could still use a single identifier for that, but not "profile ID" and "codec ID". However, that identifier would replace the need for a profile ID and vendor ID.

Until we have that, don't bother with this. We should not attempt reverse lookups. It gets too complicated also considering we support referring to profiles of other vendors.

kschiffer commented 2 years ago

Alright then. I've finalized the wireframes so that we can now plan implementation.

See the clickdummy/wireframe

Please have a look and confirm.

I this still blocked on anything else? Otherwise we can remove the blocked label as well.

johanstokking commented 2 years ago

This look complete to me.

KrishnaIyer commented 2 years ago

ACK. Looks good to me as well. This isn't blocked so I'll remove that label.

kschiffer commented 2 years ago

Planning

Here's a rough planning summary of how we are going to implement this. I will keep this post updated with progressive insight.

Structure

Ok, so as discussed in our meeting last Wednesday we will split implementation up using the following scaffold

<EndDeviceOnboardingForm>
  <EndDeviceTypeFormSection>
    <DeviceTypeRepositoryFormSection />
    <DeviceTypeManualFormSection />
  </EndDeviceTypeFormSection>
  <EndDeviceProvisioningFormSection>
    <EndDeviceRegistrationFormSection />
    <EndDeviceClaimingFormSection />
  </EndDeviceProvisioningFormSection>
</EndDeviceOnboardingForm>

Here's the scaffold applied to the wireframe:

Implementation details

We will use React's context API to store global form data within <EndDeviceOnboardingForm />, which then all sub-components will subscribe to. This way we avoid passing down a lot of props to the child components and rather use <EndDeviceOnboardingForm /> as a single source of truth about global form state. We will use two contexts here:

Formik Context (can be obtained by useFormikContext()-hook)
A custom context, will be created using React's context API and will contain global form configuration, e.g. whether EUI generation is allowed, whether server components are disabled, etc)

The form is highly dynamic but there are certain self-containing sections of the form, which I have outlined above which each should be able to handle their own concerns while reacting to certain form field values stored in the global context.

E.g. <EndDeviceTypeRepository /> will handle the Input method field and render either <DeviceTypeRepositoryFormSection /> or <DeviceTypeManualFormSection /> based on the user's selection. It will do so by invoking the context via useFormikContext() which contains the current form values, allowing us to do the conditional rendering.

Reuse of existing code/components

Generally, the new form contains many existing elements and we should reuse as much as possible, this includes validation schemas, general utilities, JSX markup, etc. We should however make sure to address some of the issues we have in the current code:

Avoid custom hooks for field value changes (replace by using formik context)
Avoid ternaries in validation schemas (to improve clarity)
Write a comment on every single prop validation (to improve maintainabitity)
Use encoder/decoders wherever possible (validation schema should NOT be concerned with type conversion)

Other things to note

Mind default values for dynamic fields: default values have to be adjusted as well, when a field is stripped
Mind different stack configs (no NS/JS, not dev EUI generation, limited user rights etc)
We should modularize the code in a sane way (not too much, not too little)

Where to go from here

I will create a feature branch with a scaffold of the implementation and then we can delegate tasks and work on this in parallel.

TheThingsNetwork / lorawan-stack