Tendrl / specifications

Tendrl specs go here
GNU Lesser General Public License v3.0
6 stars 16 forks source link

Ceph integration/bridge needs to provide pool usage data #80

Closed brainfunked closed 7 years ago

brainfunked commented 7 years ago

Use ceph df to gather the data and add it as part of the ceph state sync in the integration. The collectd ceph plugin may be a useful reference for how to parse this data.

shtripat commented 7 years ago

@brainfunked I understand the pool utilization data needs to be fetched, parsed and pushed to time series DB for trending purpose as well.

For pool utilization instant data for the cluster, we can think of getting the same data pulled at the time of ceph state sync and attach to the cluster state.

Comments??

shtripat commented 7 years ago

Adding snippet from discussion on ceph-devel below (not able to get archive link)

> Hi Team,
>
> Our team is currently working on project named "tendrl" [1][2].
> Tendrl is a management platform for software defined storage system like 
> Ceph, Gluster etc.
>
> As part of tendrl we are integrating with collectd to collect 
> performance data and we maintain the time series data in graphite.
>
> I have a question at this juncture regarding pool utilization data.
> As our thought process goes, we think of using output from command "ceph 
> df" and parse it to figure out pool utilization data and push it to 
> graphite using collectd.
> The question here is what is/would be performance impact of running 
> "ceph df" command on ceph nodes. We should be running this command only 
> on mon nodes I feel.
>

Correct, that data comes from the MONs and is not that heavy.

> Wanted to verify with the team here if this thought process is in right 
> direction and if so what ideally should be frequency of running the 
> command "ceph df" from collectd.
>

Running the command means forking a process every time and also going through the whole cephx authentication and client <> MON process.

> This is just from our point of view and we are open to any other 
> foolproof solution (if any).

The best would be to keep a open connection to a MON and run the 'df' command directly on the MONs in a loop.

I wrote something like that in Python a while ago for 'ceph status': https://gist.github.com/wido/ac53ae01d661dd57f4a8

cmd = {"prefix":"status", "format":"json"}

If you change that to:

cmd = {"prefix":"df", "format":"json"}

You ask the MON for 'df' and get back a JSON. Run that in a loop where you sleep every 1 or 5 seconds and you should have very real-time information.
shtripat commented 7 years ago

This is being taken care as part of https://github.com/Tendrl/specifications/pull/93