google-research / t5x

Apache License 2.0
2.58k stars 296 forks source link

E tensorflow/tsl/platform/cloud/curl_http_request.cc #1544

Open chenyue-max opened 2 months ago

chenyue-max commented 2 months ago

When do training, we hit the error:

I0423 06:47:26.398192 139841385358528 utils.py:1954] Initializing dataset for task 'the_pile_span_corruption' with a replica batch size of 2048 and a seed of 1713854846 I0423 06:47:26.423343 139841385358528 dataset_info.py:578] Load dataset info from /mnt/VMSTORE/pile_data/the_pile/lm/1.0.0 I0423 06:47:26.668935 139841385358528 dataset_providers.py:1557] Sharding at the data source: 1 of 1 2024-04-23 06:48:27.103605: E tensorflow/tsl/platform/cloud/curl_http_request.cc:610] The transmission of request 0x1c1c6fa0 (URI: http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token) has been stuck at 0 of 0 bytes for 61 seconds and will be aborted. CURL timing information: lookup time: 7.7e-05 (No error), connect time: 0.000206 (No error), pre-transfer time: 0.000261 (No error), start-transfer time: 0 (No error)