Closed Bertverbeek4PS closed 1 year ago
@Arthurvdv can you have a look?
I've applied this to the pipelines of Synapse of my test environment, where I didn't notice any significant improvement. This could be the low volume of data, where a larger set of data would be more significant for this change. Then again, I didn't encounter any issues, so including this will not break anything and could only be beneficial, so looks good!
I've applied this to the pipelines of Synapse of my test environment, where I didn't notice any significant improvement. This could be the low volume of data, where a larger set of data would be more significant for this change. Then again, I didn't encounter any issues, so including this will not break anything and could only be beneficial, so looks good!
Ok thanks for trying it out. So if you approve the pull request then it will go in 😄
Currently, data is not deliberately partitioned in the dataflow. Partitioning based on a unique identifier (systemid + company) can reduce data shuffling between worker nodes and reduce execution time.
Original PR: https://github.com/microsoft/bc2adls/pull/108