BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.93k stars 183 forks source link

add storage_options to create_table? #1198

Open raybellwaves opened 3 years ago

raybellwaves commented 3 years ago

I was testing this in dask-sql (https://github.com/nils-braun/dask-sql/issues/84#issuecomment-731491607) and wanted to test this here.

It seems storage_options isn't an arg for create_table. The docs point to creating a dask.dataframe first: https://docs.blazingdb.com/docs/dask

$ conda create -n test_env python=3.7 -y
$ conda activate test_env
$ conda install -c blazingsql-nightly -c rapidsai-nightly -c nvidia -c conda-forge -c defaults blazingsql python=3.7 cudatoolkit=10.2 -y
$ python

from blazingsql import BlazingContext
bc = BlazingContext()

storage_options = {'account_name': 'azureopendatastorage'}
bc.create_table("taxi", "az://nyctlc/green/puYear=2019/puMonth=1/*.parquet", storage_options=storage_options)
>>> bc.create_table("taxi", "az://nyctlc/green/puYear=2019/puMonth=1/*.parquet", storage_options=storage_options)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.7/site-packages/pyblazing/apiv2/context.py", line 1833, in create_table
    kwargs_validation(kwargs, "create_table")
  File "/home/ray/local/bin/anaconda3/envs/test_env/lib/python3.7/site-packages/pyblazing/apiv2/context.py", line 881, in kwargs_validation
    + params_info
Exception: ERROR: The parameter 'storage_options' does not exists. Please make sure you are using the correct parameter:
To get the correct parameters, check:  https://docs.blazingdb.com/docs/create_table
wmalpica commented 3 years ago

Hello @raybellwaves . storage_options is not a parameter for create_table.

We currently do not support Azure Blob Store, which is why we recommend using dask in our documentation as a temporary workaround. We do plan on implementing a storage plugin for Azure in the next couple months. We do support s3, HDFS and Google Compute Storage (https://docs.blazingdb.com/docs/connecting-data-sources)