amphi-ai / amphi-etl

Low-code ETL for structured and unstructured data. Generates Python code you can deploy anywhere.
https://docs.amphi.ai
Other
551 stars 12 forks source link

Excel File Input: sheet and header arguments error #44

Open TriSSS91 opened 5 days ago

TriSSS91 commented 5 days ago

Some problems with reading Excel.

  1. Got an error, when choosing sheet option in Excel File Input:
Error
read_excel() got an unexpected keyword argument 'sheet'
[Show Traceback](http://localhost:8888/lab#)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[2], line 7
1 # Source code generated by Amphi
2 # Date: 2024-06-25 09:31:25
3 # Additional dependencies: openpyxl
4 import pandas as pd
----> 7 excelfileInput1 = pd.read_excel("dq_report.xlsx", sheet=' Sheet1', header='0').convert_dtypes()
9 excelfileInput1

TypeError: read_excel() got an unexpected keyword argument 'sheet'

Maybe pd.read_excel() is looking for sheet_name argument instead of sheet argument:

pd.read_excel("dq_report.xlsx", sheet_name='Sheet1', header='0').convert_dtypes()

  1. When we add&choose Custom sheet, library adds space in the start of the sheet:

image

  1. We'll got erros for sheet / header argemument, because all arguments are transformed to strings (instead of int, string, list):

image

image

tgourdel commented 5 days ago

Deleted my previous comment. Thanks for reporting, my bad, this error has been fixed for the output not the input :/ (#32) Will fix ASAP

tgourdel commented 3 days ago

Should be fixed in 0.4.8. pip install --upgrade --force-reinstall amphi-etl jupyterlab-amphi

TriSSS91 commented 2 days ago

Checked different scenarious:

header:

ValueError: header must be integer or list of integers

sheet_name:

Error 'dict' object has no attribute 'convert_dtypes' Show Traceback

AttributeError Traceback (most recent call last) Cell In[58], line 7 1 # Source code generated by Amphi 2 # Date: 2024-06-28 17:32:22 3 # Additional dependencies: openpyxl 4 import pandas as pd ----> 7 excelfileInput1 = pd.read_excel("dq_report.xlsx", engine='openpyxl', sheet_name=['Sheet1'], header=0).convert_dtypes() 9 excelfileInput1

AttributeError: 'dict' object has no attribute 'convert_dtypes'