FlowFuse / flowfuse

Connect, collect, transform, visualise, and interact with your Industrial Data in a single platform. Use FlowFuse to manage, scale and secure your Node-RED solutions.

https://flowfuse.com

Other

283 stars 64 forks source link

Backend API for CPU/Memory Usage #772

Open sammachin opened 2 years ago

sammachin commented 2 years ago

Epic

223

Description

As a: project owner

I want to: know my projects CPU and Memory use over the last 24hrs/7days/30days

So that: I can judge if I need to upgrade to a higher capacity project type

Acceptance Criteria

[ ] Capture data for Docker, K8S and possibly local fs instances
[ ] Record data points with appropriate granularity
[ ] API to supply data to front end
...

ZJvandeWeg commented 2 years ago

@sammachin The first iteration probably only needs to track OOM events. The rest isn't important for this release.

ZJvandeWeg commented 2 years ago

I disagree with the scheduling/timing of this one. We should invest in features that are in one tier and not the other so we're selling a use-case rather than resources. Further: Node-RED is a low-code platform and FlowForge should abstract away from CPU/Memory insights, our customers aren't running hardware, they're integrating software.

hardillb commented 2 years ago

This should be a extension to the container driver API so it can be implemented independently for each backend.

Dockroad has support for the Docker Stats endpoint which should be starting point https://docs.docker.com/engine/api/v1.37/#tag/Container/operation/ContainerStats

Will look for K8s and localfs versions

joepavitt commented 2 years ago

Further: Node-RED is a low-code platform and FlowForge should abstract away from CPU/Memory insights, our customers aren't running hardware, they're integrating software.

This is all well and good, but if we consider all of the agreed-upon personas too, only really Bianca (Business User) would be disinterested in this stuff. Harry, Danielle and Chris (our core users) would all be technical enough to understand these concepts.
If they're only seeing OOM events, then I think it's too vague to the user. Whilst we should flag these, I also don't see any harm in making the "Capacity/Usage" visible to a user, albeit we may disguise the technical (CPU/Memory) terminology.

hardillb commented 2 years ago

K8s metric-server might have enough information https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/

ZJvandeWeg commented 2 years ago

If they're only seeing OOM events, then I think it's too vague to the user. Whilst we should flag these, I also don't see any harm in making the "Capacity/Usage" visible to a user, albeit we may disguise the technical (CPU/Memory) terminology.

@joepavitt Given this is a story and not an epic I'm cautious on the breath of the scope in one iteration. A fully fletched dashboard with data retention for 30 days at least I don't find appealing in terms of scope. The linked APIs from @hardillb all provide insights into current usage, so there's normalisation needed as well?

Taken from our values in the handbook:

Ship the minimun viable change possible. Small changes allow fast feedback loops which in turn can aid in deciding the next minimal viable iteration. Further, iteration naturally splits big problems into small steps, creates positive momentum, and allows to capture value quicker.

hardillb commented 2 years ago

Yes, I'm just trying to get a good handle on what the platforms can actually provide. But they will need normalising to be 0-100% of what the stack (in combination with the driver) allows

joepavitt commented 2 years ago

Given this is a story and not an epic I'm cautious on the breath of the scope in one iteration. - very fair.

I do think it's worth recording the CPU/Memory usage as an Epic then, could see that being in the product at some point, and being of value, and just because it doesn't fit into a 0.8 - 0.1, I don't think we should throw it away entirely.

sammachin commented 2 years ago

This story is specifically about the Back End APIs to get this data from the various containers and does not go into how this may be presented in the UI. Although I do think abstracting away the raw numbers in favor of a % of the allowed limit would be my preferred approach.

In addition this API/Data will be needed for displaying the info to admins in order to run the service, hence the v1 flag

hardillb commented 2 years ago

Building a prototype on Docker.

Looking to add the following to the driver's details method

{
    ...,
    memory: {
        used: <value in bytes>,
        limit: <value in bytes>
    }
}

hardillb commented 2 years ago

Got it working on Docker and the start of it working on k8s.

The javascript k8s client we are using will have better support for this in it's next release. https://github.com/kubernetes-client/javascript/pull/848

hardillb commented 2 years ago

Draft PRs raised for docker and k8s. Will need to look at the UI for it.

Key points

this is a point in time snapshot, there is no way to gather historical data without reinventing monitoring
No LocalFS support
Only returns data once the project is up and running and UI will need to check for existence of data and fail gracefully