apache / datafusion-python

Apache DataFusion Python Bindings
https://datafusion.apache.org/python
Apache License 2.0
321 stars 64 forks source link

Use int64 for TPC-H keys and set input schema to not nullable #714

Closed timsaucer closed 1 month ago

timsaucer commented 1 month ago

Which issue does this PR close?

None.

Rationale for this change

As requested by @andygrove this PR is a small update to the TPC-H data conversion from CSV to parquet. Users at large scales may have a problem of running out of unique keys.

What changes are included in this PR?

It sets the keys to use int64 instead of int32 and sets all fields as not null.

Are there any user-facing changes?

None.