Open sugiantolauw opened 6 years ago
Hi @sugiantolauw
Seems the error is from xlrd .. Please Google "xlrd assert 0 <= self.rowx < X12_MAX_ROWS".
Hi @sugiantolauw,
The errors roots from the fact that the infer_schema function generates data schema as an .xlsx file containing lots of unnecessary columns. Manually selecting and deleting all the remaining unused columns solved the issue for me.
@SauceCat, Could you look into the issue, as I am not much familiar with openpyxl. Thanks,
@sugiantolauw @rohanneps I was running into same issue, and for now openpyxl library helped. Since I need just first three columns, I am slicing those only.
from openpyxl import load_workbook
def iter_rows(ws):
for row in ws.iter_rows():
yield [cell.value for cell in row[0:3]]
def get_schema(fname):
wb = load_workbook(fname) #from openpyxl import load_workbook
ws = wb.active
schema = pd.DataFrame(iter_rows(ws))
schema.columns = schema.iloc[0]
schema = schema.reindex(schema.index.drop(0))
return schema
You can also solve this by saving the edited output workbook as a csv and then use pd.read_csv instead.
@SauceCat Can you please let me know if you have a fix for this?
you can change your code from
test_schema = pd.read_excel('output/data_schema_test_schema.xlsx')
to
test_schema = pd.read_excel('output/data_schema_test_schema.xlsx', engine='openpyxl')
Hi There,
This looks like a great package and I was testing your package for my own automation.
So, I have created the schema of my dataframe with the infer_schema() function and it returns the .xlsx file.
When I tried to read the .xlsx schema file using pd.read_excel() function, it returns "AssertionError: "
I am not sure what is happening here.
Here is the complete error:
Here is what I have installed on my python: airflow==1.8.0 alembic==0.8.10 annoy==1.11.1 asn1crypto==0.24.0 attrs==17.4.0 Automat==0.6.0 awscli==1.14.32 basemap==1.0.7 bcolz==1.1.2 beautifulsoup4==4.6.0 bitarray==0.8.1 bleach==1.5.0 bokeh==0.12.14 boto==2.48.0 boto3==1.5.22 botocore==1.8.36 branca==0.2.0 bs4==0.0.1 bz2file==0.98 cachetools==2.0.1 Cartopy==0.15.1 catboost==0.6 certifi==2018.1.18 cffi==1.11.4 chardet==3.0.4 click==6.7 click-plugins==1.0.3 cliff==2.8.0 cligj==0.4.0 cloudpickle==0.5.2 cmd2==0.8.0 colorama==0.3.7 colorcet==1.0.0 configparser==3.5.0 constantly==15.1.0 croniter==0.3.20 cryptography==2.1.4 cssselect==1.0.3 cycler==0.10.0 cymem==1.31.2 Cython==0.27.3 cytoolz==0.8.2 dask==0.17.0 datashader==0.6.5 datashape==0.5.4 deap==1.2.2 decorator==4.2.1 descartes==1.1.0 dill==0.2.7.1 distributed==1.21.0 Django==1.11.8 docutils==0.14 en-core-web-sm==2.0.0 entrypoints==0.2.3 et-xmlfile==1.0.1 fastai==0.6 feather-format==0.4.0 Fiona==1.7.11.post1 Flask==0.11.1 Flask-Admin==1.4.1 Flask-Cache==0.13.1 Flask-Login==0.2.11 flask-swagger==0.2.13 Flask-WTF==0.12 folium==0.5.0 ftfy==4.4.3 funcsigs==1.0.0 future==0.15.2 gensim==3.2.0 geopandas==0.3.0 geoplot==0.1.2 gitdb2==2.0.3 GitPython==2.1.8 graphviz==0.8.2 gunicorn==19.3.0 h5py==2.7.1 heapdict==1.0.0 html5lib==0.9999999 hyperlink==17.3.1 idna==2.6 ijson==2.3 impyla==0.14.0 incremental==17.5.0 ipykernel==4.8.0 ipython==6.2.1 ipython-genutils==0.2.0 ipywidgets==7.1.1 isoweek==1.3.3 itsdangerous==0.24 JayDeBeApi==1.1.1 jdcal==1.3 jedi==0.11.1 Jinja2==2.8.1 jmespath==0.9.3 JPype1==0.6.2 jsonschema==2.6.0 jupyter==1.0.0 jupyter-client==5.2.2 jupyter-console==5.2.0 jupyter-contrib-core==0.3.3 jupyter-contrib-nbextensions==0.4.0 jupyter-core==4.4.0 jupyter-highlight-selected-word==0.1.0 jupyter-latex-envs==1.4.1 jupyter-nbextensions-configurator==0.4.0 kaggle-cli==0.12.13 Keras==2.1.3 ktext==0.31 lightgbm==2.1.0 llvmlite==0.21.0 locket==0.2.0 lockfile==0.12.2 lxml==3.8.0 Mako==1.0.7 Markdown==2.6.9 MarkupSafe==1.0 matplotlib==2.1.2 MechanicalSoup==0.8.0 missingno==0.4.0 mistune==0.8.3 mizani==0.4.4 more-itertools==4.1.0 msgpack==0.5.1 msgpack-numpy==0.4.1 msgpack-python==0.5.1 multipledispatch==0.4.9 multiprocess==0.70.5 munch==2.2.0 murmurhash==0.28.0 nbconvert==5.3.1 nbformat==4.4.0 networkx==2.1 nltk==3.2.5 nose==1.3.7 notebook==5.3.1 numba==0.36.2 numpy==1.14.0 olefile==0.45.1 opencv-python==3.4.0.12 openpyxl==2.5.0 ordereddict==1.1 packaging==16.8 palettable==3.1.0 pandas==0.22.0 pandas-summary==0.0.41 pandocfilters==1.4.2 param==1.5.1 parsel==1.3.1 parso==0.1.1 partd==0.3.8 pathlib==1.0.1 pathos==0.2.1 patsy==0.5.0 pbr==3.1.1 pexpect==4.3.1 pickleshare==0.7.4 Pillow==5.0.0 plac==0.9.6 plotly==2.3.0 plotnine==0.3.0 ply==3.10 pox==0.2.3 ppft==1.6.4.7.1 preshed==1.0.0 prettytable==0.7.2 progressbar2==3.34.3 prompt-toolkit==1.0.15 protobuf==3.4.1 psutil==5.4.3 ptyprocess==0.5.2 pyarrow==0.8.0 pyasn1==0.4.2 pyasn1-modules==0.2.1 pycparser==2.18 PyDispatcher==2.0.5 pydot==1.2.4 pydqc==0.1 pyemd==0.5.1 Pygments==2.2.0 pyodbc==4.0.22 pyOpenSSL==17.5.0 pyparsing==2.2.0 pyperclip==1.6.0 Pyphen==0.9.4 pyproj==1.9.5.1 pyshp==1.2.12 python-daemon==2.1.2 python-dateutil==2.6.1 python-editor==1.0.3 python-Levenshtein==0.12.0 python-nvd3==0.14.2 python-slugify==1.1.4 python-utils==2.2.0 pytz==2017.3 PyYAML==3.12 pyzmq==16.0.3 qtconsole==4.3.1 queuelib==1.4.2 regex==2017.4.5 requests==2.18.4 rsa==3.4.2 s3transfer==0.1.12 scikit-learn==0.19.1 scipy==1.0.0 Scrapy==1.5.0 seaborn==0.8.1 selenium==3.8.1 Send2Trash==1.4.2 service-identity==17.0.0 setproctitle==1.1.10 Shapely==1.6.4.post1 simplegeneric==0.8.1 six==1.11.0 sklearn-pandas==1.6.0 smart-open==1.5.6 smmap2==2.0.3 sortedcontainers==1.5.9 spacy==2.0.6 SQLAlchemy==1.2.2 statsmodels==0.8.0 stevedore==1.28.0 stopit==1.1.1 tabulate==0.7.7 tblib==1.3.2 tensorflow==1.4.1 tensorflow-tensorboard==0.1.5 teradata==15.10.0.21 termcolor==1.1.0 terminado==0.8.1 testpath==0.3.1 textacy==0.5.0 Theano==1.0.1 thinc==6.10.2 thrift==0.9.3 thriftpy==0.3.9 toolz==0.9.0 torchtext==0.2.1 tornado==4.5.3 TPOT==0.9.2 tqdm==4.19.5 traitlets==4.3.2 Twisted==17.9.0 ujson==1.35 Unidecode==1.0.22 update-checker==0.16 urllib3==1.22 w3lib==1.19.0 wcwidth==0.1.7 webencodings==0.5.1 Werkzeug==0.14.1 widgetsnbextension==3.1.0 wrapt==1.10.11 WTForms==2.1 xarray==0.10.0 xgboost==0.7 xlrd==1.1.0 XlsxWriter==1.0.2 zict==0.1.3 zope.deprecation==4.3.0 zope.interface==4.4.3
Let me know your thoughts and if you need more info.
Sugi