cloudfoundry-attic / cf-abacus

CF usage metering and aggregation
Apache License 2.0
98 stars 86 forks source link

Correcting incorrect aggregation. #419

Open KRuelY opened 8 years ago

KRuelY commented 8 years ago

There could be times where the implementer of abacus make mistakes such as: incorrect resource plan mapping, incorrect pricing, or incorrect formula. There are also time where the mistakes happen in the resource provider's side: system down, the app that submits to abacus stop / have bugs.

The mistakes can happen anytime, and it might not be recognized right away and it might happen past the slack window. As the implementer of abacus, there would be a need to correct those mistakes.

One of the solution is to merge two(or more) rated usage documents. The reporting app from the main abacus pipeline would fetch a document from multiple database(from side abacus pipelines that would produce the right aggregation) and merge them. This way, we don't need to worry about calculating manually for millions of record or submitting outside of the slack window.

Feel free to add any thoughts or solutions in this issue!

cf-gitbot commented 8 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/129809649

The labels on this github issue will be updated when the story is started.

hsiliev commented 8 years ago

I would like to have a clean way to trace any changes in the documents, done outside the normal pipeline flow. I would even go for a audit logging for such kind of actions. Merging rated documents without clean separation of responsibilities seems like a bad idea from this point of view. I created issue #420 to discuss a possible solution for this aspect.

It's not clear to me how exactly would we merge the documents? We need to guarantee that a merge will produce the same result as running a corrective document through the pipeline.

KRuelY commented 8 years ago

This was the thought behind merging the documents:

The merge happens at the reporting app, so that means that we're not modifying anything that is produced and stored from the main pipeline. What we're modifying is the report that will be shown to the costumer.

The reporting app would call account plugin, and if the account plugin determine that correction is needed, it would return a correct aggregated rated usage document to fetch with some information needed by the reporting app to construct the right aggregation report.

This correct rated usage document is the result of resubmitting the usage docs submitted by resource provider to a different side abacus pipeline with the correct setting, slack configurations, and different database. All the submissions needed in order to obtain the correct aggregations would be resubmitted to this side pipeline.

If the account plugins returns a correction rated usage document to fetch, the reporting would go through all the necessary steps to construct the correct report and returns it to the caller.

The downside of this approach are:

The upside of this approach are:

Of course we're still in the middle of figuring out what would be the best solution. We went through some ideas, but there are always challenges and complication due to slack window and different type of formulas.