finopsfoundation / focus_converters

Parent repository to hold all common documentation and code samples for all FOCUS Converter projects
MIT License
82 stars 44 forks source link

AWS CUR Parquet conversion fails: 'bill_invoicing_entity' not found in data #358

Open saurav955 opened 2 months ago

saurav955 commented 2 months ago

Describe the bug After downloading AWS CUR Parquet file and running the converter, it fails saying 'bill_invoicing_entity' not found in data.

To Reproduce Steps to reproduce the behavior: Download parquet file from aws

Run: focus-converter convert --provider aws --data-path /opt/aws-plaform-00005.snappy.parquet --data-format parquet --parquet-data-format dataset --export-path /tmp/test/

Error: ValueError: Column(s) 'bill_invoicing_entity' not found in data

Expected behavior The converter runs as expected.

Additional context OS : Debian 12 focus-converter version : 1.0

Well the error seems self explanatory, but we are not sure

  1. How we can add this field in Cost and Usage report?
  2. Can we convert the file even without this field in the report?
saurav955 commented 2 months ago

I converted the parquet file to a csv using pandas and tried adding bill_invoicing_entity.
There was also a similar field named bill_billing_entity and I even tried replacing it with bill_invoicing_entity (thinking bill_invoicing_entity has changed to bill_billing_entity).

I tried to run focus-converter with both csv files using the following command.

Command: focus-converter convert --provider aws --data-path sample.csv --data-format csv --export-path /tmp/test/

With this I was able to bypass the column not found error But now i am getting a conversion error.

ComputeError: expected duration or datetime, got str

Error originated just after this operation:
 WITH_COLUMNS:
 [col("line_item_currency_code").alias("BillingCurrency")]
   WITH_COLUMNS:
   [col("bill_payer_account_id").alias("BillingAccountId")]
     WITH_COLUMNS:
     [col("line_item_availability_zone").alias("AvailabilityZone")]
       WITH_COLUMNS:
       [col("line_item_resource_id").str.split([String(:)]).list.get([6]).str.titlecase().alias("ResourceName")]
         WITH_COLUMNS:
         [col("line_item_resource_id").str.split([String(:)]).list.get([6]).alias("tmp_resource_id_ResourceType")]
           WITH_COLUMNS:
           [col("line_item_resource_id").str.split([String(:)]).list.get([5]).alias("tmp_resource_type_ResourceType")]
             WITH_COLUMNS:
             [null.alias("CommitmentDiscountName").strict_cast(String)]
               WITH_COLUMNS:
               [null.alias("SubAccountName").strict_cast(String)]
                 WITH_COLUMNS:
                 [null.alias("BillingAccountName").strict_cast(String)]
                   WITH_COLUMNS:
                   [col("product_purchase_option").cast(String)]
                     WITH_COLUMNS:
                     [col("line_item_resource_id").cast(String)]
                       WITH_COLUMNS:
                       [col("pricing_public_on_demand_rate").cast(Float64)]
                         WITH_COLUMNS:
                         [col("reservation_unused_recurring_fee").cast(Float64)]
                           WITH_COLUMNS:
                           [col("reservation_unused_amortized_upfront_fee_for_billing_period").cast(Float64)]
                             WITH_COLUMNS:
                             [col("savings_plan_total_commitment_to_date").cast(Float64)]
                               WITH_COLUMNS:
                               [col("savings_plan_used_commitment").cast(Float64)]
                                 WITH_COLUMNS:
                                 [col("savings_plan_used_commitment").cast(Float64)]
                                   WITH_COLUMNS:
                                   [col("line_item_unblended_cost").cast(Float64)]
                                     WITH_COLUMNS:
                                     [col("product_region").alias("product_region")]
                                       WITH_COLUMNS:
                                       [null.strict_cast(String).alias("savings_plan_savings_plan_arn")]
                                         WITH_COLUMNS:
                                         [null.strict_cast(String).alias("reservation_reservation_arn")]
                                           WITH_COLUMNS:
                                           [col("reservation_reservation_a_r_n").alias("reservation_reservation_a_r_n")]
                                             WITH_COLUMNS:
                                             [null.strict_cast(Float64).alias("line_item_net_unblended_cost")]
                                              DF ["", "identity_line_item_id", "identity_time_interval", "bill_invoice_id"]; PROJECT */273 COLUMNS; SELECTION: "None"

LogicalPlan had already failed with the above error; after failure, 30 additional operations were attempted on the LazyFrame