I0423 06:47:26.398192 139841385358528 utils.py:1954] Initializing dataset for task 'the_pile_span_corruption' with a replica batch size of 2048 and a seed of 1713854846
I0423 06:47:26.423343 139841385358528 dataset_info.py:578] Load dataset info from /mnt/VMSTORE/pile_data/the_pile/lm/1.0.0
I0423 06:47:26.668935 139841385358528 dataset_providers.py:1557] Sharding at the data source: 1 of 1
2024-04-23 06:48:27.103605: E tensorflow/tsl/platform/cloud/curl_http_request.cc:610] The transmission of request 0x1c1c6fa0 (URI: http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token) has been stuck at 0 of 0 bytes for 61 seconds and will be aborted. CURL timing information: lookup time: 7.7e-05 (No error), connect time: 0.000206 (No error), pre-transfer time: 0.000261 (No error), start-transfer time: 0 (No error)
When do training, we hit the error:
I0423 06:47:26.398192 139841385358528 utils.py:1954] Initializing dataset for task 'the_pile_span_corruption' with a replica batch size of 2048 and a seed of 1713854846 I0423 06:47:26.423343 139841385358528 dataset_info.py:578] Load dataset info from /mnt/VMSTORE/pile_data/the_pile/lm/1.0.0 I0423 06:47:26.668935 139841385358528 dataset_providers.py:1557] Sharding at the data source: 1 of 1 2024-04-23 06:48:27.103605: E tensorflow/tsl/platform/cloud/curl_http_request.cc:610] The transmission of request 0x1c1c6fa0 (URI: http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token) has been stuck at 0 of 0 bytes for 61 seconds and will be aborted. CURL timing information: lookup time: 7.7e-05 (No error), connect time: 0.000206 (No error), pre-transfer time: 0.000261 (No error), start-transfer time: 0 (No error)