Closed rjzamora closed 1 year ago
cc @rnyak
@rjzamora looks like some tests are failing.
looks like some tests are failing.
Right, I guess CI is using a pretty old version of cudf - The necessary feature was exposed in cudf in 22.08 (https://github.com/rapidsai/cudf/pull/10980). I guess we could just raise an error when the user tries to use this option with an older cudf release.
I think it's reasonable to require a new enough version of cudf instead of making code changes to accommodate older versions. I'm working on getting the CI image back up to date—with any luck, hopefully in the next week or so. (It's a slow process since it involves waiting on a lot of container builds.)
I think it's reasonable to require a new enough version of cudf instead of making code changes to accommodate older versions. I'm working on getting the CI image back up to date—with any luck, hopefully in the next week or so. (It's a slow process since it involves waiting on a lot of container builds.)
That makes sense to me - Thanks @karlhigley !
@rjzamora and @karlhigley when do you think this PR can be revisited based on new cudf version comment? thanks.
The CI image has been updated to match our containers, so hopefully the tests will pass now 🤞🏻
Adds simple
row_group_size
argument toDataset.to_parquet
, allowing users to set the maximum number of rows desired in a single Parquet row-group.