ToucanToco / fastexcel

A Python wrapper around calamine
http://fastexcel.toucantoco.dev/
MIT License
82 stars 4 forks source link

Cannot handle special symbols #220

Closed durgeksh closed 3 months ago

durgeksh commented 3 months ago

Hi team, I am facing one issue on reading excel sheet through Polars. It says calamine cell error: #VALUE! The sheet in interest does not get read through standard api.

How to handle such special symbols through fastexcel?

Thank you.

I am using fastexcel==0.10.2.

lukapeschke commented 3 months ago

Hi @durgeksh could you please provide the entire stack trace ? and maybe a file allowing to reproduce the issue ? thanks!

durgeksh commented 3 months ago
Traceback (most recent call last):
  File "/Users/neo/Desktop/workspace/pocs/polarsdemo.py", line 84, in <module>
    df = pl.read_excel("sample_data.xlsx", engine='calamine')
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/neo/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/_utils/deprecation.py", line 134, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/neo/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/_utils/deprecation.py", line 134, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/neo/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 253, in read_excel
    return _read_spreadsheet(
           ^^^^^^^^^^^^^^^^^^
  File "/Users/neo/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 475, in _read_spreadsheet
    parsed_sheets = {
                    ^
  File "/Users/neo/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 476, in <dictcomp>
    name: reader_fn(
          ^^^^^^^^^^
  File "/Users/neo/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/polars/io/spreadsheet/functions.py", line 821, in _read_spreadsheet_calamine
    ws = parser.load_sheet_by_name(sheet_name, **read_options)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/neo/Desktop/workspace/pocs/.venv/lib/python3.11/site-packages/fastexcel/__init__.py", line 184, in load_sheet_by_name
    self._reader.load_sheet(
_fastexcel.CalamineCellError: calamine cell error: #VALUE!
Context:
    0: could not determine dtype for column Amount

Process finished with exit code 1

Sample file: sample_data.xlsx

durgeksh commented 3 months ago

Thank you @lukapeschke for fixing this so fast.

lukapeschke commented 3 months ago

@durgeksh you're welcome, thank you for the sample file!

durgeksh commented 3 months ago

@lukapeschke Can we provide an option to parse these special symbols as a string and retain in the sheet please? Now, it removes the symbol and puts null there.

Thank you.

lukapeschke commented 3 months ago

@durgeksh could you please create a separate issue for that ? I'll mark it as a feature request

durgeksh commented 3 months ago

Yes, sure. Thank you.

PrettyWood commented 3 months ago

@durgeksh It's not that simple because if it's considered as a string then it can mess up with the rest of the column that would have another type So it requires either everything to be casted as a string or some kind of union type on the column. Anyway it's definitely a feature request!

durgeksh commented 2 months ago

@PrettyWood Yes, in that case safe typecast would be string for the column with special symbols.