Open kennknowles opened 2 years ago
The performance of WindowInto may worth investigation as I noticed that Python text IO write has worse performance than Java SDK, and the slowest DoFn is WindowInto(GlobalWindows()):
Java metrics: http://104.154.241.245/d/bnlHKP3Wz/java-io-it-tests-dataflow?orgId=1&viewPanel=4 Python metrics: http://104.154.241.245/d/gP7vMPqZz/python-io-it-tests-dataflow?orgId=1&viewPanel=5
Java Read ~20s; Java Write ~30s; Python Read ~100s; Python Write 270s
Two noticable difference from job graph
The Java write pipeline graph looks like this:
The Python write pipeline graph looks like this:
Add microbenchmarks for the windowinto transform:
R: [~tvalentyn]
Imported from Jira BEAM-4855. Original Jira may contain additional context. Reported by: matthiasml6.