databricks / spark-sql-perf

Apache License 2.0
586 stars 407 forks source link

Child tables #28

Closed nongli closed 9 years ago

nongli commented 9 years ago

In some of the tpcds tables, there is a parent/child relationship. This means that the child table cannot be generated independently and is output (optionally) as part of generating the parent table. This benchmark requires the output being written to stdout. When the tpcds generator runs with stdout as the output, it cannot support child tables.

The dsdgen tool has been updated to support this and can optionally output the data with the table prefix. For example, in this mode the output is:

store_sales| store_sales| store_returns| store_sales|

The rows are interleaved but can be filtered for. This patch adds the filtering.

nongli commented 9 years ago

Does the commit message show up in github?

In some of the tpcds tables, there is a parent/child relationship. This means that
the child table cannot be generated independently and is output (optionally) as part
of generating the parent table. This benchmark requires the output being written to
stdout. When the tpcds generator runs with stdout as the output, it cannot support
child tables.

The dsdgen tool has been updated to support this and can optionally output the data
with the table prefix. For example, in this mode the output is:

store_sales|<row>
store_sales|<row>
store_returns|<row>
store_sales|<row>

The rows are interleaved but can be filtered for. This patch adds the filtering.
marmbrus commented 9 years ago

If you put it in the description of the PR, the merge script will make it the commit message when we merge.

marmbrus commented 9 years ago

Oh, and you can click ... to see the full message for any given commit before we merge.