balena-os / balena-supervisor

Balena Supervisor: balena's agent on devices.
https://balena.io
Other
148 stars 63 forks source link

Make V2 endpoints have feature parity with V1 #1586

Open 20k-ultra opened 3 years ago

20k-ultra commented 3 years ago

The V2 endpoints supports only some of the operations the V1 offers. In order for us to deprecate V1 we need V2 to fill the holes and offer all the functions V1 does.

V1 was created for single container applications and V2 came along when we released mulit-container applications. At the time we just implemented the features specific to multi-app so things like device rebooting, purging data, etc are still on V1.

cywang117 commented 3 years ago

@20k-ultra @pipex I compiled a list of differences between V1 & V2 endpoints in order to get a straightforward list of specs. Please let me know if I missed something, or if I should move this over into a Google Doc:

⬜ TODO 🚧 In progress ✔️Complete

Status V1 V2 Desc Notes
✔️ POST /v1/blink POST /v2/blink Blinks device LED for 15s. -
✔️ POST /v1/reboot POST /v2/reboot Reboots device, failing if update locks in place. (override: force) -
✔️ POST /v1/shutdown POST /v2/shutdown Shuts down device, similar base functionality to reboot -
✔️ POST /v1/purge POST /v2/applications/:appId/purge Purges user /data directory for specific app. Should we implement separate endpoint to purge all, vs. purging just one?
✔️ POST /v1/restart POST /v2/applications/:appId/restart

POST /v2/applications/:appId/restart-service
Restart user application services. -
✔️ POST /v1/regenerate-api-key POST /v2/regenerate-api-key Regenerates SV API key. -
🚧 GET /v1/device GET /v2/state/status

GET /v2/version
v1,
v2/state/status,
v2/version,
v2/containerId
v2/state/status delivers v1's status, download_progress (overallDownloadProgress), commit (release) fields.

v2/version delivers v1's status, supervisor_version (version) fields.

Fields missing in v2 but present in v1: api_port, ip_address, mac_address, os_version, update_pending, update_downloaded, update_failed. update_* fields are probably superfluous.
✔️ POST /v1/apps/:appId/stop POST /v2/applications/:appId:/stop-service In v2, stops the service specified by imageId/serviceName keys -
✔️ POST /v1/apps/:appId/start POST /v2/applications/:appId/start-service In v2, starts the service specified by imageId/serviceName keys -
🚧 GET /v1/apps/:appId GET /v2/applications/:appId/state

GET /v2/containerId

GET /v2/state/status
v1,
v2/apps/:appId/state,
v2/containerId,
v2/state/status
v2/apps/:appId/state delivers v1's appId, commit fields.

v2/containerId delivers v1's containerId field.

v2/state/status delivers v1's imageId field.

Fields missing in v2 but present in v1: env
✔️ GET /v1/healthy GET /v2/healthy Checks internally whether SV is running correctly -
PATCH /v1/device/host-config - Update host OS configs such as hostname and proxy configs. v1 potentially incomplete: v2 can be made to support changing local_port proxy variable, at least.
GET /v1/device/host-config - Get host OS configs such as hostname and proxy configs. v2 should return local_port and local_ip variables.

v2 also contains endpoints that don't have v1 implementations - those aren't included here.

20k-ultra commented 3 years ago

nice! This makes it really easy to see what's needed and tracking the progress @cywang117.

cywang117 commented 3 years ago

Encountered issue with livepush while developing /v2/reboot:

image

This uncaught error prevents livepush from restarting correctly (A file update after this error results in Error: (HTTP code 404) no such container - No such container: CONTAINER_ID and a crash.)

Needs further investigation -- will open an issue (and possibly make a patch PR) after I determine the cause, but this issue doesn't prevent a reboot from happening, so I won't look into it further for V2 feature parity.

Insights appreciated!

pipex commented 3 years ago

Are you consistently getting this error @cywang117, can you find a way to replicate? I have seen this on livepush when the connection is interrupted, e.g. the device goes offline, or if I stop the supervisor without stopping the livepush client first. If you are consistently getting this it is probably a bug.

cywang117 commented 3 years ago

@pipex Looks like it's the scenario you mentioned, where livepush throws an error on sudden connection interrupt such as a device reboot (my case). This error consistently appears when hitting the reboot endpoint.

I don't want to dive into this too far until V2 endpoints are complete, but I took a quick look at the related code files (supervisor sync directory and the livepush repo). From a glance, it doesn't look like the livepush process errors out completely until the device reboots and it tries to acquire the restarted supervisor container by an incorrect CONTAINER_ID. The ECONNREFUSED error probably wouldn't matter overall, if livepush can recover gracefully from it, but currently it can't (from what I can see).

Anyways, this is unrelated to V2 feature parity, but I'd be interested in looking into this further in the future.

jellyfish-bot commented 3 years ago

[cywang117] This issue has attached support thread https://jel.ly.fish/eb10d914-502e-496c-879d-1d6cec4f7047

jellyfish-bot commented 2 years ago

[cywang117] This issue has attached support thread https://jel.ly.fish/f6f3f050-1771-4b4d-895c-1d42ae4512f2

cywang117 commented 2 years ago

Maybe for a v2 update endpoint, we should add an option to force an update through regardless of poll interval / instant update status (per the above ticket), what do you think @pipex @20k-ultra ?

We currently have force which only pertains to update locks. I'm suggesting we have a force that works for both update locks and poll interval / instant update status.