Closed alberttwong closed 1 month ago
@MarkovWangRR FYI
+1 on this issue.
Encountered the following error on loading multiple csv files (via separate terminals but using the same db connection) onto SR (docker - allin1-ubuntu:3.2.4) using sling. Sling starts loading the data fine but in the middle of the load, the above errors come up for different csv files.
Error:
12m30s 16,743,351 23008 r/s 11 GB12m31s 16,766,702 23030 r/s 11 GB12m32s 16,789,944 23044 r/s 11 GB12m33s 16,813,222 23059 r/s 11 GB12m34s 16,836,468 23071 r/s 11 GB | 23% MEM | 28% CPU 2024-04-02 11:14:16 DBG stream-load completed for /tmp/starrocks/db/SFU_Fact_Screening_2017_tmp/2024-04-02T110141.174/part.01.0067.csv => {
"TxnId": -1,
"Label": "579a248b-c0f3-429f-a7be-4f7918fa5bdb",
"Status": "Fail",
"Message": "call frontend service failed, address=TNetworkAddress(hostname=<host_ip>, port=9020), reason=THRIFT_EAGAIN (timed out)",
"NumberTotalRows": 0,
"NumberLoadedRows": 0,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 0,
"LoadTimeMs": 0,
"BeginTxnTimeMs": 0,
"StreamLoadPlanTimeMs": 0,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 0,
"CommitAndPublishTimeMs": 0
}
12m35s 16,875,853 22389 r/s 11 GB | 23% MEM | 43% CPU
2024-04-02 11:14:22 DBG loading /tmp/starrocks/db/SFU_Fact_Screening_2017_tmp/2024-04-02T110141.174/part.01.0068.csv [164 MB] ds.1712080900942.fro-0
2024-04-02 11:14:22 DBG drop table if exists `db`.`SFU_Fact_Screening_2017_tmp`
2024-04-02 11:14:22 DBG table `db`.`SFU_Fact_Screening_2017_tmp` dropped
2024-04-02 11:14:22 DBG closed "starrocks" connection (conn-starrocks-UAB)
2024-04-02 11:14:22 INF execution failed
fatal:
--- sling_cli.go:418 func1 ---
--- sling_cli.go:474 cliInit ---
--- cli.go:284 CliProcess ---
~ failure running task (see docs @ https://docs.slingdata.io/sling-cli)
--- sling_logic.go:224 processRun ---
--- sling_logic.go:371 runTask ---
~ execution failed
--- task_run.go:138 Execute ---
--- database_starrocks.go:504 func4 ---
Failed loading from /tmp/starrocks/db/SFU_Fact_Screening_2017_tmp/2024-04-02T110141.174/part.01.0067.csv into `db`.`SFU_Fact_Screening_2017_tmp`
{
"TxnId": -1,
"Label": "579a248b-c0f3-429f-a7be-4f7918fa5bdb",
"Status": "Fail",
"Message": "call frontend service failed, address=TNetworkAddress(hostname=<host_ip>, port=9020), reason=THRIFT_EAGAIN (timed out)",
"NumberTotalRows": 0,
"NumberLoadedRows": 0,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 0,
"LoadTimeMs": 0,
"BeginTxnTimeMs": 0,
"StreamLoadPlanTimeMs": 0,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 0,
"CommitAndPublishTimeMs": 0
}
context canceled
--- task_run.go:97 func1 ---
~ could not write to database
--- task_run.go:387 runFileToDB ---
~ could not insert into `db`.`SFU_Fact_Screening_2017_tmp`.
--- task_run_write.go:307 WriteToDb ---
--- database_starrocks.go:504 func4 ---
Failed loading from /tmp/starrocks/db/SFU_Fact_Screening_2017_tmp/2024-04-02T110141.174/part.01.0067.csv into `db`.`SFU_Fact_Screening_2017_tmp`
{
"TxnId": -1,
"Label": "579a248b-c0f3-429f-a7be-4f7918fa5bdb",
"Status": "Fail",
"Message": "call frontend service failed, address=TNetworkAddress(hostname=<host_ip>, port=9020), reason=THRIFT_EAGAIN (timed out)",
"NumberTotalRows": 0,
"NumberLoadedRows": 0,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 0,
"LoadTimeMs": 0,
"BeginTxnTimeMs": 0,
"StreamLoadPlanTimeMs": 0,
"ReadDataTimeMs": 0,
"WriteDataTimeMs": 0,
"CommitAndPublishTimeMs": 0
}
context canceled
Following is our sling connection details:
export STARROCKS='{ type: starrocks, url: "starrocks://root@<host_ip>:9030/db", fe_url: "http://<host_ip>:8030" }'
sling command:
./sling run \
--src-stream file:///SFU_Fact_Screening_2017.csv \
--src-options '{"format": "csv", "options": {"delimiter": "|", "header": true}}' \
--tgt-conn STARROCKS \
--tgt-object db.SFU_Fact_Screening_2017 \
--mode full-refresh \
--debug
call frontend service failed, address=TNetworkAddress(hostname=
, port=9020), reason=THRIFT_EAGAIN (timed out)
This may probably be caused by lock contention in FE when you have too many concurrent stream load jobs. You can check fe.log
and search for slow db lock
to see whether there is heavy lock contetion. And also to avoid this error, you can adjust the timeout of stream load job, like -H "timeout:300"
, doc ref: https://docs.starrocks.io/docs/sql-reference/sql-statements/data-manipulation/STREAM_LOAD/#set-timeout-period.
We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!
See https://github.com/slingdata-io/sling-cli/issues/229