apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
320 stars 63 forks source link

Add examples from TPC-H #666

Closed timsaucer closed 1 month ago

timsaucer commented 1 month ago

Which issue does this PR close?

This PR does not close #440 but it helps to address one part of it.

Rationale for this change

One of the difficulties for new users to DataFusion is to find helpful examples. This PR adds in a series of examples that are based on the queries performed for the TPC-H benchmark. Those are known examples and we have a script in place to generate data for users to work with. By adding these examples, we will give new users both dataframe to work with and a series of examples showing Data Fusion in operation.

What changes are included in this PR?

This PR makes one change to the generator script to update it's docker image location.

All other changes are within the examples folder.

Are there any user-facing changes?

No user facing changes.

andygrove commented 1 month ago

These examples are looking really nice @timsaucer. Don't feel that you have to wait until all of them are implemented before we start merging into main. We could do this in stages if you like.

timsaucer commented 1 month ago

Thanks for the feedback. I am seeing a few differences between a couple of the results I'm getting and what's in the answers file, so I want to get those resolved before merging. I also want to put something in the readme pointing out which examples contain different features to make it easy for people to find things. At the rate I'm going, I'll probably have the last 10 done before mid week.

timsaucer commented 1 month ago

I've added to the main readme in the examples folder, so I think this PR is good to go pending review.