apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.7k stars 4.2k forks source link

RunInference Benchmarking tests #21454

Open damccorm opened 2 years ago

damccorm commented 2 years ago

RunInference benchmarks will evaluate performance of Pipelines, which represent common use cases of Beam **** Dataflow in Pytorch, sklearn and possibly TFX. These benchmarks would be the integration tests that exercise several software components using Beam, PyTorch, Scikit learn and TensorFlow extended.

we would use the datasets that's available publicly (Eg; Kaggle). 

Size: small / 10 GB / 1 TB etc

The default execution runner would be Dataflow unless specified otherwise.

These tests would be run very less frequently(every release cycle).  

Imported from Jira BEAM-14068. Original Jira may contain additional context. Reported by: Anand Inguva. Subtask of issue #21435

damccorm commented 2 years ago

Unable to assign user @AnandInguva. If able, self-assign, otherwise tag @damccorm so that he can assign you. Because of GitHub's spam prevention system, your activity is required to enable assignment in this repo.