Open smlHao opened 1 year ago
What do you want to report? The app failed?
@zuston hi , thanks !!!
What do you want to report 1 . when huge table join huge table, shuffle server have blocked threads , Is it right?
2 . the executor have no daemon thread and holding , seem to hold on senddata to uniffle server , is my conf need to adjust ? The app failed ? after running 2 hours, seem no failed but executors holding , executor logs not update ,driver log only have uniffle heatbeat log
Do you have performance tuning about spark sql huge table join ?
No. But I have tuning with the huge partition. Firstly, we should find out the root cause. Please tell what happened for you app.
No. But I have tuning with the huge partition. Firstly, we should find out the root cause. Please tell what happened for you app.
@zuston thanks !!! yes, you are right , I found that :
executor log long time no update :
then I check the executor stack , find there are no daemon threads WAITTING, seem to holding on senddata:
then I analysis the shuffle server stack , find there are threads BLOCKED :
myapp process seem no change, But I can`t find out the root cause , Do you have some steps to help me ?
tuning with the huge partition Do you have documents help me do this ?
Can you check the shuffle-server and executor GC? Why not using the spark ui? And I think if you want to analysis, it's better to show the metrics into dashboard.
Can you check the shuffle-server and executor GC? Why not using the spark ui? And I think if you want to analysis, it's better to show the metrics into dashboard. 1 . check the shuffle-server and executor GC : shuffle-server no full gc , but jvm_memory_bytes_used is close to XMX_SIZE="60g" : executor gc seem normal :
@zuston How do you tuning with the huge partition ? Could you help me do this ?
Code of Conduct
Search before asking
Describe the bug
@jerqi @zuston
hi, when huge table join huge table, shuffle server have blocked threads , Is it right?
server conf :
rss.rpc.server.port 20000 rss.jetty.http.port 20001 rss.storage.basePath /app/rss-0.7.1/data rss.storage.type MEMORY_LOCALFILE_HDFS rss.coordinator.quorum 172.100.3.70:19999,172.100.3.71:19999,172.100.3.72:19999 rss.server.disk.capacity 50g
rss.server.flush.thread.alive 30 rss.server.flush.threadPool.size 10 rss.server.buffer.capacity 40g rss.server.read.buffer.capacity 20g rss.server.heartbeat.interval 10000 rss.rpc.message.max.size 1073741824 rss.server.preAllocation.expired 120000 rss.server.commit.timeout 600000 rss.server.app.expired.withoutHeartbeat 120000 rss.server.flush.cold.storage.threshold.size 512m
rss client conf :
spark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager spark.rss.coordinator.quorum=172.100.3.70:19999,172.100.3.71:19999,172.100.3.72:19999
spark.rss.storage.type=MEMORY_LOCALFILE_HDFS spark.rss.remote.storage.path=hdfs://ns1/rss/sml
the executor have no daemon thread holding and hava no error log
Affects Version(s)
0.7.1
Uniffle Server Log Output
No response
Uniffle Engine Log Output
No response
Uniffle Server Configurations
No response
Uniffle Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?