amphi-ai / amphi-etl

Low-code ETL for structured and unstructured data. Generates Python code you can deploy anywhere.
https://docs.amphi.ai
Other
533 stars 11 forks source link

Excel File Input more options: select engine (incl. calamine), usecols, skiprows, decimal, etc #37

Open TriSSS91 opened 1 week ago

TriSSS91 commented 1 week ago

Hello! Thanks for the nice ETL tool.

It will be great to add more options (according pandas library) to select for Excel File Input:

  1. engine - option to select specific engine for pandas to read Excel file ('pyxlsb', 'openpyxl', 'calamine'), setting 'calamine' as default engine, because of it's amazing speed (pls, see python-calamine )
  2. usecols - for selecting wich column to read from Excel file
  3. skiprows - for skiping some rows from a head of a sheet
  4. decimal - for correct parsing Europian-style TEXT to numeric
  5. dtype_backend - for selecting 'numpy' or 'pyarrow' backend
tgourdel commented 1 week ago

Thanks @TriSSS91, I'll definitely take a look at your suggestions. For now it does use openpyxl but being able to use different engines is of course something to consider.

tgourdel commented 4 days ago

Engine selection is now possible in the latest version 0.4.7. pip install --upgrade --force-reinstall amphi-etl Keeping open to address the other options. I'm considering adding a universal parameter specifier to be able to add regular pandas options without cluttering the UI with options rarely used.

TriSSS91 commented 3 days ago

Engine selection is now possible in the latest version 0.4.7. pip install --upgrade --force-reinstall amphi-etl Keeping open to address the other options. I'm considering adding a universal parameter specifier to be able to add regular pandas options without cluttering the UI with options rarely used.

Didn't find engine selection in version 0.4.7 in UI: image

tgourdel commented 3 days ago

Thanks @TriSSS91, looks like the the update didn't work as I thought. Would you be able to try this: pip install --upgrade --force-reinstall amphi-etl jupyterlab-amphi

TriSSS91 commented 3 days ago

Thanks @TriSSS91, looks like the the update didn't work as I thought. Would you be able to try this: pip install --upgrade --force-reinstall amphi-etl jupyterlab-amphi

Tried to do this, pip shows version 0.4.7 for both amphi-etl and jupyterlab-amph, but UI is still the same.

tgourdel commented 1 day ago

Could you try again and check if you finally get the option? Thank you for your patience pip install --upgrade --force-reinstall amphi-etl jupyterlab-amphi

TriSSS91 commented 13 hours ago

Could you try again and check if you finally get the option? Thank you for your patience pip install --upgrade --force-reinstall amphi-etl jupyterlab-amphi

Thanks! Now engine selection is avaliable in UI. But sometimes it works great, and sometimes (same data, same pipline) - got an error:

ERROR: Could not find a version that satisfies the requirement calamine (from versions: none) ERROR: No matching distribution found for calamine

Maybe the reason is that python module name is python-calamine, and we need to import it. And the calamine is the alias we use in engine option.