Open stoiev opened 9 months ago
Hi @stoiev, do you have ideas around on what we can do this to achieve this?
@stoiev in case it is helpful, I was able to create a CUR2.0 billing file using the new "Data Exports" functionality in AWS. This is exposed through the "bcm-data-exports" client in boto3. One thing I did find necessary was the tool expects a column "product_region" so I renamed "product_region_code" through a manual QueryStatement definition. I am not sure how the specification expects region data so hopefully that isn't an issue but it doesn't seem to be particular enough from what I see in the PDF.
So, this code block generates CUR2.0 billing files that I was able to pass to the tool without issue (apologies for any minor bugs caused by removing company-related things):
import boto3
account_id = 'XXXXXXXXXXXX'
# region must be us-east-1 for bcm-data-exports functionality
region_name = 'us-east-1'
# install CLI and use `aws configure`
profile_name = 'supply_your_own_creds'
session = boto3.session.Session(profile_name=profile_name, region_name=region_name)
# requires boto3 >= 1.29.7
client = session.client('bcm-data-exports')
sql_columns = ','.join[
'bill_bill_type'
,'bill_billing_entity'
,'bill_billing_period_end_date'
,'bill_billing_period_start_date'
,'bill_invoice_id'
,'bill_invoicing_entity'
,'bill_payer_account_id'
,'bill_payer_account_name'
,'cost_category'
,'discount'
,'discount_bundled_discount'
,'discount_total_discount'
,'identity_line_item_id'
,'identity_time_interval'
,'line_item_availability_zone'
,'line_item_blended_cost'
,'line_item_blended_rate'
,'line_item_currency_code'
,'line_item_legal_entity'
,'line_item_line_item_description'
,'line_item_line_item_type'
,'line_item_net_unblended_cost'
,'line_item_net_unblended_rate'
,'line_item_normalization_factor'
,'line_item_normalized_usage_amount'
,'line_item_operation'
,'line_item_product_code'
,'line_item_resource_id'
,'line_item_tax_type'
,'line_item_unblended_cost'
,'line_item_unblended_rate'
,'line_item_usage_account_id'
,'line_item_usage_account_name'
,'line_item_usage_amount'
,'line_item_usage_end_date'
,'line_item_usage_start_date'
,'line_item_usage_type'
,'pricing_currency'
,'pricing_lease_contract_length'
,'pricing_offering_class'
,'pricing_public_on_demand_cost'
,'pricing_public_on_demand_rate'
,'pricing_purchase_option'
,'pricing_rate_code'
,'pricing_rate_id'
,'pricing_term'
,'pricing_unit'
,'product'
,'product_comment'
,'product_fee_code'
,'product_fee_description'
,'product_from_location'
,'product_from_location_type'
,'product_from_region_code'
,'product_instance_family'
,'product_instance_type'
,'product_instancesku'
,'product_location'
,'product_location_type'
,'product_operation'
,'product_pricing_unit'
,'product_product_family'
,'product_region_code AS product_region'
,'product_servicecode'
,'product_sku'
,'product_to_location'
,'product_to_location_type'
,'product_to_region_code'
,'product_usagetype'
,'reservation_amortized_upfront_cost_for_usage'
,'reservation_amortized_upfront_fee_for_billing_period'
,'reservation_availability_zone'
,'reservation_effective_cost'
,'reservation_end_time'
,'reservation_modification_status'
,'reservation_net_amortized_upfront_cost_for_usage'
,'reservation_net_amortized_upfront_fee_for_billing_period'
,'reservation_net_effective_cost'
,'reservation_net_recurring_fee_for_usage'
,'reservation_net_unused_amortized_upfront_fee_for_billing_period'
,'reservation_net_unused_recurring_fee'
,'reservation_net_upfront_value'
,'reservation_normalized_units_per_reservation'
,'reservation_number_of_reservations'
,'reservation_recurring_fee_for_usage'
,'reservation_reservation_a_r_n'
,'reservation_start_time'
,'reservation_subscription_id'
,'reservation_total_reserved_normalized_units'
,'reservation_total_reserved_units'
,'reservation_units_per_reservation'
,'reservation_unused_amortized_upfront_fee_for_billing_period'
,'reservation_unused_normalized_unit_quantity'
,'reservation_unused_quantity'
,'reservation_unused_recurring_fee'
,'reservation_upfront_value'
,'resource_tags'
,'savings_plan_amortized_upfront_commitment_for_billing_period'
,'savings_plan_end_time'
,'savings_plan_instance_type_family'
,'savings_plan_net_amortized_upfront_commitment_for_billing_period'
,'savings_plan_net_recurring_commitment_for_billing_period'
,'savings_plan_net_savings_plan_effective_cost'
,'savings_plan_offering_type'
,'savings_plan_payment_option'
,'savings_plan_purchase_term'
,'savings_plan_recurring_commitment_for_billing_period'
,'savings_plan_region'
,'savings_plan_savings_plan_a_r_n'
,'savings_plan_savings_plan_effective_cost'
,'savings_plan_savings_plan_rate'
,'savings_plan_start_time'
,'savings_plan_total_commitment_to_date'
,'savings_plan_used_commitment'
]
data_export = {
'DataQuery': {
'QueryStatement': f"SELECT {sql_columns} FROM COST_AND_USAGE_REPORT",
'TableConfigurations': {
'COST_AND_USAGE_REPORT': {
'INCLUDE_RESOURCES': 'TRUE',
'TIME_GRANULARITY': 'HOURLY',
'INCLUDE_SPLIT_COST_ALLOCATION_DATA': 'FALSE',
'INCLUDE_MANUAL_DISCOUNT_COMPATIBILITY': 'FALSE'
}
}
},
'DestinationConfigurations': {
'S3Destination': {
'S3Bucket': f'awsbilling-{account_id}',
'S3OutputConfigurations': {
'Compression': 'PARQUET',
'Format': 'PARQUET',
'OutputType': 'CUSTOM',
'Overwrite': 'CREATE_NEW_REPORT'
},
'S3Prefix': 'ResourceHourlyDetail',
'S3Region': region_name
}
},
'Name': 'CostUsageReport',
'RefreshCadence': {
'Frequency': 'SYNCHRONOUS'
}
}
client.create_export(Export=export_def)
I pass the parquet files created by the Data Export to the tool as follows:
focus-converter convert --provider aws --data-path CostUsageReport-00001.snappy.parquet --data-format parquet --parquet-data-format file --export-path ./
Thanks @davidschneider2W ! I've also noticed that product_region
is the only collumn that differs between CUR1 and CUR2 that breaks current implementation of focus_converter
.
Instead of changing source, I've changed focus_converter
related configuration file and got same results (just for concept proofing). But thinking about a workaround, your approach is much better.
The migration steps recommended by AWS is here. SQL's approuch is the first one, and maybe it could be the recommended solution for now in focus_converter
(by documenting it). In AWS doc, product_region
renaming is not mentioned, and I submitted a feedback on this documentation to get more accurate info.
However, we will face issues regarding data that are represented as scalars
(one column per value) in CUR1 and became maps
(one column with all values) in CUR2, as metioned in AWS docs:
You also need to update your data ingestion pipeline and your business intelligence tools to process the following new columns with nested key-values:
product
,resource_tags
,cost_category
, anddiscounts
.
At least resource_tags
and cost_category
- two ways of tagging resources in AWS - will be a need for focus_converter
users. I will try to map all the differences between those two formats and feed this issue with more concrete results.
But as far as I know, differences between CUR 1 and 2 seems to be minimal. Maybe it worth to invest in a generic conversion configuration that works for both formats.
PS: I'm saying that withouth know how extensible and generic conversion configurations could be.
Hi @stoiev and @davidschneider2W this is awesome. Could anyone of you please create a pull request with the proposed steps to convert data.
I've advanced in the idea of conversion config's compatibility for CUR 1 and 2 versions.
https://github.com/finopsfoundation/focus_converters/pull/333
Is your feature request related to a problem? Please describe. No
Describe the solution you'd like Support for AWS CUR 2.0 format.
Additional context It's not clear if AWS will deprecate actual CUR format, but they are classifying it as "Legacy" in all documentation and web console.
For newer integrations, it's suitable to start with some format that does not have this adjective.
Some ideas of implementation already had been exposed here, but I think it's better to discuss this in a specific issue.