databricks / spark-sql-perf

Apache License 2.0
586 stars 407 forks source link

Fix files truncating according to maxRecordPerFile #180

Closed gcz2022 closed 5 years ago

gcz2022 commented 5 years ago

(numRows)/maxRecordPerFile) automatically performs a floor, and invalidate the ceil : )

gcz2022 commented 5 years ago

cc @yhuai @cloud-fan It's a very simple change, could you please take a look?

cloud-fan commented 5 years ago

thanks merged!

npoggi commented 5 years ago

@gczsjdy thanks for spotting it. That code could trigger only at 10TB+ scale.

gcz2022 commented 5 years ago

@npoggi No problem. : ) Why? I think it can also trigger below 10T scale, although we are benchmarking on a 10T scale.

npoggi commented 5 years ago

@gczsjdy due to the default maxRecordPerFile, with the current partition scheme, it should not be reached at smaller scales. Anyway, other improvements please submit a PR :)