Closed naved001 closed 4 days ago
@knikolla This is what I have in my head so far:
First I want to take care of the low hanging fruits by moving some pieces of code into their own modules/class.
# prometheus_client.py
class PrometheusClient:
def __init__(self, prometheus_url: str, token: str, step_min: int=15):
self.proemtheus_url = prometheus_url
self.token = token
self.step_min = step_min
def query_metric(self, metric, start_date, end_date):
"""Queries metric from prometheus/thanos for the provided openshift_url"""
...
return data
#s3_client.py
class S3Client:
def __init__(self):
#init
def upload_file(self, file_path: str, s3_path: str) -> None:
self.s3.upload_file(file_path, Bucket=self.bucket_name, Key=s3_path)
def download_file(self, file_path: str, s3_path: str) -> None:
self.s3.download_file(file_path, Bucket=self.bucket_name, Key=s3_path)
class ColdFrontClient(object):
def __init__(self, keycloak_url, keycloak_client_id, keycloak_client_secret):
self.session = self.get_session(keycloak_url,
keycloak_client_id,
keycloak_client_secret)
@staticmethod
def get_session(keycloak_url, keycloak_client_id, keycloak_client_secret):
"""Authenticate as a client with Keycloak to receive an access token."""
token_url = f"{keycloak_url}/auth/realms/mss/protocol/openid-connect/token"
...
return session
Then I want to have a class that's responsible for processing the metrics; this will need to be aware of the times we don't bill for because the duration a pod ran is calculated here.
#metrics_processor.py
class MetricsProcessor(object):
def __init__(self, merged_metrics: dict = {}, interval_minutes: int = 15):
self.merged_metrics = merged_metrics
self.interval_minutes = interval_minutes
def merge_metrics(self, metric_name, metric_list):
"""
Merge metrics by pod but since pod names aren't guaranteed to be unique across
namespaces, we organize by namespace: self.merged_metrics[namespace][pod]
"""
return self.merged_metrics
def condense_metrics(self, metrics_to_check):
"""
Checks if the value of metrics is the same, and removes redundant
metrics while updating the duration. If there's a gap in the reported
metrics then don't count that as part of duration.
This is where we will need to ignore certain hours we don't bill for. That mechanism doesn't exist as of now.
"""
return condensed_dict
The data I get from MetricsProcessor is now organized and each pod has the duration it ran for. Although I still need to calculate the SU type for these pods before an invoice can be generated. I am thinking about doing it in MetricsProcessor.
After that I could reuse good chunk of the code here: https://github.com/CCI-MOC/openstack-billing-from-db/blob/main/src/openstack_billing_db/billing.py
I would also have to add something similar to generate a pod report. And ultimately, a main.py
that can run two ways: 1. to collect metrics and 2. to generate reports.
@naved001 That seems pretty good for a start!
One comment: Don't put mutable objects in default arguments :)
>>>
>>> class MetricsProcessor(object):
... def __init__(self, merged_metrics: dict = {}, interval_minutes: int = 15):
... self.merged_metrics = merged_metrics
... self.interval_minutes = interval_minutes
...
>>> a = MetricsProcessor()
>>> b = MetricsProcessor()
>>>
>>> a.merged_metrics is b.merged_metrics
True
>>> a.merged_metrics.update({1: 1})
>>> b.merged_metrics
{1: 1}
Don't put mutable objects in default arguments :)
Thanks for pointing it out, that would be terrible.
@naved001 this looks great, what are the next steps?
@joachimweyl I will start the refactoring incrementally with multiple small PRs.
This issue sounds like it is only interested in the creation of the blueprints. shall we close this and open individual issues for the incremental steps?
@joachimweyl correct. I will do that.
This has grown organically as the requirements grew. The goal is to come up with the blueprint on how we want to organize and refactor this codebase.