cloudfoundry-attic / cf-abacus

CF usage metering and aggregation
Apache License 2.0
98 stars 86 forks source link

Aggregating data between foundations? #1174

Closed aegershman closed 5 years ago

aegershman commented 5 years ago

Has anyone used cf-abacus to do usage reporting/billing against multiple foundations?

We've got sandbox, dev, and prod foundations in AWS, GCP, Azure, and vSphere, so I'm curious how well cf-abacus aggregate together data from multiple sources? I realize this is quite general, apologies for open-endedness, so no expectation of responses.

Feel free to close this whenever. Thanks!

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/166050737

The labels on this github issue will be updated when the story is started.

hsiliev commented 5 years ago

Hi @aegershman,

By default Abacus does not do multi-foundation accumulation & aggregation. It accumulates & aggregates numbers based on the organization id. This means that submitting 10 MB for org A and then 20 MB for the same org A are usually crunched as sum: 30 MB usage for org A.

As organization ids are GUIDs generated by CF it is hard to achieve the same for different regions (or foundations). Instead we have an abstraction one level above orgs - the account. It is used to group several orgs from different regions/foundations. You can think of this as accounting usage from different regions.

A snippet from our wiki at https://github.com/cloudfoundry-incubator/cf-abacus/blob/master/doc/api.md#json-representation shows how you can prefix orgs with regions/foundations:

...
"organization_id": "us-south:54257f98-83f0-4eca-ae04-9ea35277a538",
...

The account plugin then assigns account id based on the org id. In this way 2 different orgs might be in the same account based on mapping you define externally. Have in mind that the default account plugin is just an example used to define the REST API you implement to achieve this.

The final step in the process is to plug account accumulation & aggregation. This can be done in two ways:

aegershman commented 5 years ago

@hsiliev thanks for your response. I will openly admit I have not used cf-abacus; this is helpful. Interesting! I think I can see how that model could be used to aggregate multiple foundations, but also multiple solutions under the context of a single account. E.g., my business organization has multiple "orgs" on CF (overloaded) which represent individual software products our business org owns and maintains... So having prefixes could be used to get a 'rollup' of multiple CF orgs into one account context. Will investigate this more. Interesting.

as a sidenote, may I ask, what external analytics/billing system does SAP use? is it off-the-shelf software? Or an inhouse system? If you're unable to discuss I understand. I'm currently shopping around (hah) for different billing/showback options to use for Cerner's internal CF consumption

hsiliev commented 5 years ago

Note that the prefix is not mandatory. It is used to help the account plugin to decide what account to assign. You might get without prefix.

Inside SAP we use Hybris Cloud, Analytics and several gluing components to integrate these. Our account plugin delegates to CRM system that knows the details about who bought what. Abacus account and provisioning plugins allows this to happen quite easily.

Have in mind that we're now replacing Abacus with our commercial systems as we have most of the functionality we need there. SAP can allow this, but it might be expensive for smaller CF installations to do that. Tried to describe the issues we had with Abacus here: https://hsiliev.blogspot.com/2019/04/cloud-foundry-abacus-v2.html

Btw what kind of data do you want to aggregate/accumulate? Is this only apps & service usage or you want to handle other metrics as well?

aegershman commented 5 years ago

Gotcha, thank you. Unfortunately hooking into larger commercial offerings is out of my availability.


That's a great blog post, it adds interesting background and commentary to the history/challenges of Abacus. You might consider including a link to it on this repo? Didn't seem to find it as a link anywhere. Regardless thank you for sharing.

Open source community: IBM no longer contributes to Abacus, SAP remains only contributor ... The project however was put on hold and replaced with internal SAP solutions to allow among others:

Ahh bummer. Does this mean Abacus will (effectively) stop being dogfooded internally at SAP & perhaps not be as maintained? Don't mean to put you on the spot, just curious.


Btw what kind of data do you want to aggregate/accumulate? Is this only apps & service usage or you want to handle other metrics as well?

That's a great question which changes every meeting and every conversation 😉I'm sometimes rarely asked questions like "how many max-concurrent AIs did {x, y, and z} orgs from {iaas_a, iaas_b} consume between Feb. and June, excluding AIs suffixed with '-venerable'? how many billable SIs did they run? (excluding autoscaler, spring cloud config, and eureka-- count those as AIs)"... and metrics like "how often are applications being redeployed to CF, even if it's the same version? What's the average age of an app?" These can become pretty complicated questions with edge-cases and rules.

But at it's core, what I'm trying to solve for is showback/billing, e.g. demonstrating a monetary price associated to AIs/SIs for orgs. I'm trying to keep things simple and appease the finance-types && my boss's boss. I need to start there so we can keep justifying CF. Obviously my team / all developers we enable know the value of CF, but I need ways to represent the monetary value so it rolls up to management.

Our basic plan is:

That's it. Those are base requirements. All other data like "how many pushes to prod did {org_a} take to prod last month? how many new apps did {org_z} onboard since a year ago?" is very interesting and valuable, but right now I'm thinking about $$billing$$. Even if it's not 100% accurate. From what it sounds like Abacus could do something like this; the only reason I haven't deployed Abacus in our CFs is that we don't have marketplace-brokered mongodb. We only use redis, mysql, and rmq for data. So using mongo will require setting it up outside the context of CF (which is survivable, don't mean to sound whiney about it. but not having on-demand marketplace-provisioned mongo slows down evaluating it).

I've also been following the work of Chris Phillipson on cf-butler, cf-hoover, and cf-hoover-ui as a way to aggregate certain interesting "value stream" metrics data and other data together.

any thoughts? thanks a ton for your time and responses, I appreciate it.

hsiliev commented 5 years ago

You're right - Abacus will effectively stop being dogfooded internally at SAP as we'll no longer use it productively.

Abacus master uses plain URL for MongoDB, so you don't need brokered service anymore.

As current Abacus version has pretty big footprint in terms of apps and DB it might be a good idea to consider more light-weight approach. We ended up using 130 apps + 4TB MongoDB storage per installation and we have several of those.

The most lightwieght approach for apps/services usage I'm aware of is to utilize the current app/service usage events sitting in CF Cloud Controller DB. For heavily usitilized CF (2-3 millions app pushes/month) this means 10-20GB of MySQL/PostgreSQL.

CCDB events are not generic and plugging other things like number of API calls or DB reads/writes is hard. Therefore something like Abacus still makes sense. However at its current state its too CF-centric and we see more and more companies using CF, k8s and proprietary infra (like DBs for instance).

Perhaps you can use metrics system (or simply DB) and feed it data, including apps/services from CF, using the Abacus account approach to group foundations?

hsiliev commented 5 years ago

If you have more questions please reopen