Open CarloNicolini opened 2 years ago
@CarloNicolini Hi, I have developed a similar module with support for the latest spark server-client mode, maybe you can try it out:
Install with:
pip install sparglim["magic"]
Support SQL statement u need 🎉
%%sql CREATE TABLE tb_people
USING json
OPTIONS (path "/path/to/file.json");
Show tables;
Dear @cryeo,
I really like your library as it makes possible to integrate SQL syntax directly into cells, that's a nice piece of work!
However I would like to hear from you what's the best way to read
.csv
local files into Spark Dataframes by means of the SparkSQL syntax without the creation of a local Hive database. I've noticed that the two foldersmetastore_db
andspark_warehouse
are always produced in the same folder of the notebook when I create a table, do they act as a local database?When I run this cell:
I get this warning in the Jupyter notebook:
I do understand I don't have Apache Hive installed, but isn't possible to simply read the CSV file as SparkDataFrame without all these warnings? Doing with the PySpark API is much easier, as a
spark.read.csv('myfile.csv')
suffices and no local databases are created.