Closed wikiped closed 3 months ago
Hi @wikiped this is actually the expected behaviour: column_names
is used to specify the name of the columns in the resulting DataFrame. Since there are three columns and only two names were provided, the third name is generated based on the column index.: https://fastexcel.toucantoco.dev/fastexcel.html#ExcelReader.load_sheet
Achieving the result you want is possible with the use_columns
parameter:
import fastexcel as fe
file = 'repro_214.xlsx'
params = {'idx_or_name': 0, 'use_columns': ['B', 'C']}
print(fe.read_excel(file).load_sheet(**params).to_polars())
Will print
shape: (3, 2)
┌──────┬──────┐
│ B ┆ C │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞══════╪══════╡
│ 22.0 ┆ 23.0 │
│ 32.0 ┆ 33.0 │
│ 42.0 ┆ 43.0 │
└──────┴──────┘
However there is indeed a bug when mixing use_columns
and column_names
, so thanks for the report!
import fastexcel as fe
file = 'repro_214.xlsx'
params = {'idx_or_name': 0, 'header_row': None, 'skip_rows': 1, 'column_names': ['Col B', 'Col C'], 'use_columns': [1, 2]}
print(fe.read_excel(file).load_sheet(**params).to_polars())
Returns
shape: (3, 2)
┌───────┬──────────────┐
│ Col C ┆ __UNNAMED__2 │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═══════╪══════════════╡
│ 22.0 ┆ 23.0 │
│ 32.0 ┆ 33.0 │
│ 42.0 ┆ 43.0 │
└───────┴──────────────┘
And the expected result would be
shape: (3, 2)
┌───────┬───────┐
│ Col B ┆ Col C │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═══════╪═══════╡
│ 22.0 ┆ 23.0 │
│ 32.0 ┆ 33.0 │
│ 42.0 ┆ 43.0 │
└───────┴───────┘
Thank you for the feedback, I realized now that I forgot to paste the significant part of the parameters used: 'use_columns': [1, 2]
, which is, as you noted, part of the reason for the raising the issue.
Hi @wikiped v0.10.2
was just released and fixes this
Thank you for prompt fix!
Given this worksheet data (without any empty area):
The following code:
Outputs:
Expected output: