datafusion-contrib / datafusion-python

Python binding for DataFusion
https://arrow.apache.org/datafusion/python/index.html
Apache License 2.0
59 stars 12 forks source link

Brainstorming how to improve the docs #55

Open MrPowers opened 2 years ago

MrPowers commented 2 years ago

I'm trying to get started with DataFusion and would like to run some basic operations to try out the library. I'd like to read a CSV file into a DataFrame and run some queries.

I took a look at the docs and tried to run datafusion.ExecutionContext(), but got a "module 'datafusion' has no attribute 'ExecutionContext'" error. I was able to look at the project README and see that the new syntax is datafusion.SessionContext().

I was able to read the Rust docs and find that the Python syntax for reading a CSV is something like this:

ctx.register_csv("something", "../tmp/N_1e7_K_1e2_single.csv")
ctx.sql("SELECT v1 FROM something LIMIT 5").show()

Are you OK if I send a PR to add some more detailed usage instructions to this project README? Even basic stuff like documenting show() would help (I just guessed that would work, haha).

Once the README is updated, hopefully we can sync the latest version with the arrow.apache.org docs.

Thanks for making this cool library. I am excited to play around with it!