Open saurav955 opened 2 months ago
I converted the parquet file to a csv using pandas and tried adding bill_invoicing_entity
.
There was also a similar field named bill_billing_entity
and I even tried replacing it with bill_invoicing_entity
(thinking bill_invoicing_entity has changed to bill_billing_entity).
I tried to run focus-converter with both csv files using the following command.
Command:
focus-converter convert --provider aws --data-path sample.csv --data-format csv --export-path /tmp/test/
With this I was able to bypass the column not found error
But now i am getting a conversion error.
ComputeError: expected duration or datetime, got str
Error originated just after this operation:
WITH_COLUMNS:
[col("line_item_currency_code").alias("BillingCurrency")]
WITH_COLUMNS:
[col("bill_payer_account_id").alias("BillingAccountId")]
WITH_COLUMNS:
[col("line_item_availability_zone").alias("AvailabilityZone")]
WITH_COLUMNS:
[col("line_item_resource_id").str.split([String(:)]).list.get([6]).str.titlecase().alias("ResourceName")]
WITH_COLUMNS:
[col("line_item_resource_id").str.split([String(:)]).list.get([6]).alias("tmp_resource_id_ResourceType")]
WITH_COLUMNS:
[col("line_item_resource_id").str.split([String(:)]).list.get([5]).alias("tmp_resource_type_ResourceType")]
WITH_COLUMNS:
[null.alias("CommitmentDiscountName").strict_cast(String)]
WITH_COLUMNS:
[null.alias("SubAccountName").strict_cast(String)]
WITH_COLUMNS:
[null.alias("BillingAccountName").strict_cast(String)]
WITH_COLUMNS:
[col("product_purchase_option").cast(String)]
WITH_COLUMNS:
[col("line_item_resource_id").cast(String)]
WITH_COLUMNS:
[col("pricing_public_on_demand_rate").cast(Float64)]
WITH_COLUMNS:
[col("reservation_unused_recurring_fee").cast(Float64)]
WITH_COLUMNS:
[col("reservation_unused_amortized_upfront_fee_for_billing_period").cast(Float64)]
WITH_COLUMNS:
[col("savings_plan_total_commitment_to_date").cast(Float64)]
WITH_COLUMNS:
[col("savings_plan_used_commitment").cast(Float64)]
WITH_COLUMNS:
[col("savings_plan_used_commitment").cast(Float64)]
WITH_COLUMNS:
[col("line_item_unblended_cost").cast(Float64)]
WITH_COLUMNS:
[col("product_region").alias("product_region")]
WITH_COLUMNS:
[null.strict_cast(String).alias("savings_plan_savings_plan_arn")]
WITH_COLUMNS:
[null.strict_cast(String).alias("reservation_reservation_arn")]
WITH_COLUMNS:
[col("reservation_reservation_a_r_n").alias("reservation_reservation_a_r_n")]
WITH_COLUMNS:
[null.strict_cast(Float64).alias("line_item_net_unblended_cost")]
DF ["", "identity_line_item_id", "identity_time_interval", "bill_invoice_id"]; PROJECT */273 COLUMNS; SELECTION: "None"
LogicalPlan had already failed with the above error; after failure, 30 additional operations were attempted on the LazyFrame
Describe the bug After downloading AWS CUR Parquet file and running the converter, it fails saying 'bill_invoicing_entity' not found in data.
To Reproduce Steps to reproduce the behavior: Download parquet file from aws
Run:
focus-converter convert --provider aws --data-path /opt/aws-plaform-00005.snappy.parquet --data-format parquet --parquet-data-format dataset --export-path /tmp/test/
Error:
ValueError: Column(s) 'bill_invoicing_entity' not found in data
Expected behavior The converter runs as expected.
Additional context OS : Debian 12 focus-converter version : 1.0
Well the error seems self explanatory, but we are not sure