finopsfoundation / focus_converters

Parent repository to hold all common documentation and code samples for all FOCUS Converter projects
MIT License
83 stars 44 forks source link

Add support for AWS CUR 2.0 format #318

Open stoiev opened 9 months ago

stoiev commented 9 months ago

Is your feature request related to a problem? Please describe. No

Describe the solution you'd like Support for AWS CUR 2.0 format.

Additional context It's not clear if AWS will deprecate actual CUR format, but they are classifying it as "Legacy" in all documentation and web console.

For newer integrations, it's suitable to start with some format that does not have this adjective.

Some ideas of implementation already had been exposed here, but I think it's better to discuss this in a specific issue.

varunmittal91 commented 8 months ago

Hi @stoiev, do you have ideas around on what we can do this to achieve this?

oll-davidschneider commented 8 months ago

@stoiev in case it is helpful, I was able to create a CUR2.0 billing file using the new "Data Exports" functionality in AWS. This is exposed through the "bcm-data-exports" client in boto3. One thing I did find necessary was the tool expects a column "product_region" so I renamed "product_region_code" through a manual QueryStatement definition. I am not sure how the specification expects region data so hopefully that isn't an issue but it doesn't seem to be particular enough from what I see in the PDF.

So, this code block generates CUR2.0 billing files that I was able to pass to the tool without issue (apologies for any minor bugs caused by removing company-related things):

import boto3

account_id = 'XXXXXXXXXXXX'
# region must be us-east-1 for bcm-data-exports functionality
region_name = 'us-east-1'
# install CLI and use `aws configure`
profile_name = 'supply_your_own_creds'
session = boto3.session.Session(profile_name=profile_name, region_name=region_name)
# requires boto3 >= 1.29.7
client = session.client('bcm-data-exports')

sql_columns = ','.join[
    'bill_bill_type'
    ,'bill_billing_entity'
    ,'bill_billing_period_end_date'
    ,'bill_billing_period_start_date'
    ,'bill_invoice_id'
    ,'bill_invoicing_entity'
    ,'bill_payer_account_id'
    ,'bill_payer_account_name'
    ,'cost_category'
    ,'discount'
    ,'discount_bundled_discount'
    ,'discount_total_discount'
    ,'identity_line_item_id'
    ,'identity_time_interval'
    ,'line_item_availability_zone'
    ,'line_item_blended_cost'
    ,'line_item_blended_rate'
    ,'line_item_currency_code'
    ,'line_item_legal_entity'
    ,'line_item_line_item_description'
    ,'line_item_line_item_type'
    ,'line_item_net_unblended_cost'
    ,'line_item_net_unblended_rate'
    ,'line_item_normalization_factor'
    ,'line_item_normalized_usage_amount'
    ,'line_item_operation'
    ,'line_item_product_code'
    ,'line_item_resource_id'
    ,'line_item_tax_type'
    ,'line_item_unblended_cost'
    ,'line_item_unblended_rate'
    ,'line_item_usage_account_id'
    ,'line_item_usage_account_name'
    ,'line_item_usage_amount'
    ,'line_item_usage_end_date'
    ,'line_item_usage_start_date'
    ,'line_item_usage_type'
    ,'pricing_currency'
    ,'pricing_lease_contract_length'
    ,'pricing_offering_class'
    ,'pricing_public_on_demand_cost'
    ,'pricing_public_on_demand_rate'
    ,'pricing_purchase_option'
    ,'pricing_rate_code'
    ,'pricing_rate_id'
    ,'pricing_term'
    ,'pricing_unit'
    ,'product'
    ,'product_comment'
    ,'product_fee_code'
    ,'product_fee_description'
    ,'product_from_location'
    ,'product_from_location_type'
    ,'product_from_region_code'
    ,'product_instance_family'
    ,'product_instance_type'
    ,'product_instancesku'
    ,'product_location'
    ,'product_location_type'
    ,'product_operation'
    ,'product_pricing_unit'
    ,'product_product_family'
    ,'product_region_code AS product_region'
    ,'product_servicecode'
    ,'product_sku'
    ,'product_to_location'
    ,'product_to_location_type'
    ,'product_to_region_code'
    ,'product_usagetype'
    ,'reservation_amortized_upfront_cost_for_usage'
    ,'reservation_amortized_upfront_fee_for_billing_period'
    ,'reservation_availability_zone'
    ,'reservation_effective_cost'
    ,'reservation_end_time'
    ,'reservation_modification_status'
    ,'reservation_net_amortized_upfront_cost_for_usage'
    ,'reservation_net_amortized_upfront_fee_for_billing_period'
    ,'reservation_net_effective_cost'
    ,'reservation_net_recurring_fee_for_usage'
    ,'reservation_net_unused_amortized_upfront_fee_for_billing_period'
    ,'reservation_net_unused_recurring_fee'
    ,'reservation_net_upfront_value'
    ,'reservation_normalized_units_per_reservation'
    ,'reservation_number_of_reservations'
    ,'reservation_recurring_fee_for_usage'
    ,'reservation_reservation_a_r_n'
    ,'reservation_start_time'
    ,'reservation_subscription_id'
    ,'reservation_total_reserved_normalized_units'
    ,'reservation_total_reserved_units'
    ,'reservation_units_per_reservation'
    ,'reservation_unused_amortized_upfront_fee_for_billing_period'
    ,'reservation_unused_normalized_unit_quantity'
    ,'reservation_unused_quantity'
    ,'reservation_unused_recurring_fee'
    ,'reservation_upfront_value'
    ,'resource_tags'
    ,'savings_plan_amortized_upfront_commitment_for_billing_period'
    ,'savings_plan_end_time'
    ,'savings_plan_instance_type_family'
    ,'savings_plan_net_amortized_upfront_commitment_for_billing_period'
    ,'savings_plan_net_recurring_commitment_for_billing_period'
    ,'savings_plan_net_savings_plan_effective_cost'
    ,'savings_plan_offering_type'
    ,'savings_plan_payment_option'
    ,'savings_plan_purchase_term'
    ,'savings_plan_recurring_commitment_for_billing_period'
    ,'savings_plan_region'
    ,'savings_plan_savings_plan_a_r_n'
    ,'savings_plan_savings_plan_effective_cost'
    ,'savings_plan_savings_plan_rate'
    ,'savings_plan_start_time'
    ,'savings_plan_total_commitment_to_date'
    ,'savings_plan_used_commitment'
]

data_export = {
        'DataQuery': {
            'QueryStatement': f"SELECT {sql_columns} FROM COST_AND_USAGE_REPORT",
            'TableConfigurations': {
                'COST_AND_USAGE_REPORT': {
                    'INCLUDE_RESOURCES': 'TRUE',
                    'TIME_GRANULARITY': 'HOURLY',
                    'INCLUDE_SPLIT_COST_ALLOCATION_DATA': 'FALSE',
                    'INCLUDE_MANUAL_DISCOUNT_COMPATIBILITY': 'FALSE'
                }
            }
        },
        'DestinationConfigurations': {
            'S3Destination': {
                'S3Bucket': f'awsbilling-{account_id}',
                'S3OutputConfigurations': {
                    'Compression': 'PARQUET',
                    'Format': 'PARQUET',
                    'OutputType': 'CUSTOM',
                    'Overwrite': 'CREATE_NEW_REPORT'
                },
                'S3Prefix': 'ResourceHourlyDetail',
                'S3Region': region_name
            }
        },
        'Name': 'CostUsageReport',
        'RefreshCadence': {
            'Frequency': 'SYNCHRONOUS'
        }
    }

client.create_export(Export=export_def)

I pass the parquet files created by the Data Export to the tool as follows: focus-converter convert --provider aws --data-path CostUsageReport-00001.snappy.parquet --data-format parquet --parquet-data-format file --export-path ./

stoiev commented 8 months ago

Thanks @davidschneider2W ! I've also noticed that product_region is the only collumn that differs between CUR1 and CUR2 that breaks current implementation of focus_converter.

Instead of changing source, I've changed focus_converter related configuration file and got same results (just for concept proofing). But thinking about a workaround, your approach is much better.

The migration steps recommended by AWS is here. SQL's approuch is the first one, and maybe it could be the recommended solution for now in focus_converter (by documenting it). In AWS doc, product_region renaming is not mentioned, and I submitted a feedback on this documentation to get more accurate info.

However, we will face issues regarding data that are represented as scalars (one column per value) in CUR1 and became maps (one column with all values) in CUR2, as metioned in AWS docs:

You also need to update your data ingestion pipeline and your business intelligence tools to process the following new columns with nested key-values: product, resource_tags, cost_category, and discounts.

At least resource_tags and cost_category - two ways of tagging resources in AWS - will be a need for focus_converter users. I will try to map all the differences between those two formats and feed this issue with more concrete results.

But as far as I know, differences between CUR 1 and 2 seems to be minimal. Maybe it worth to invest in a generic conversion configuration that works for both formats.

PS: I'm saying that withouth know how extensible and generic conversion configurations could be.

varunmittal91 commented 8 months ago

Hi @stoiev and @davidschneider2W this is awesome. Could anyone of you please create a pull request with the proposed steps to convert data.

stoiev commented 7 months ago

I've advanced in the idea of conversion config's compatibility for CUR 1 and 2 versions.

https://github.com/finopsfoundation/focus_converters/pull/333