apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.46k stars 185 forks source link

[feature] support shuffle read with retry when facing IO error. #737

Closed Ted-Jiang closed 1 year ago

Ted-Jiang commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and why for this feature, in addition to the what) Same in spark

 /**
   * Max number of times we will try IO exceptions (such as connection timeouts) per request.
   * If set to 0, we will not do any retries.
   */
  public int maxIORetries() { return conf.getInt(SPARK_NETWORK_IO_MAXRETRIES_KEY, 3); }

https://github.com/apache/spark/blob/2878cd84d5d5edcaa9bbc642c12d8c7f6f009328/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java#L157 Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.