Describe the bug
Hi, I was trying a data processing pipeline. Here is the full code of the pipeline. While writing the result into CSV by default it creates several csv partition files but all the data are in the first file and other files are empty. How can I control the partition behavior?
Expected behavior
The write should create only one partition as long I do not say it to create more partitions or I do not re-partition the data frame.
Describe the bug Hi, I was trying a data processing pipeline. Here is the full code of the pipeline. While writing the result into CSV by default it creates several csv partition files but all the data are in the first file and other files are empty. How can I control the partition behavior?
Here are the file created by the program:
To Reproduce Here is the full code of the processing:
I did check the repartitioning API doc. But there is no specific example there. The data can be download from Kaggle diabetes data set.
Dep. list:
Expected behavior The write should create only one partition as long I do not say it to create more partitions or I do not re-partition the data frame.