C4 = Colossal Clean Crawled Corpus
start 20231203:0021 - estimate $100 US for gcs egress
An average of 300mbps with peaks of 900mbps from the GCP bucket means 800GB x 8 bits = 6400Gbits at .3Gbps = 6hours ~ ETA
36GB in 26 min = 25MB/sec = 200mbps = 11h (possibly limited by the hdd - go directly to NVMe next time
checked 0845 done
copy test HDD to HDD no raid -849 ~ 1330 = 4.5h
HDD to NVMe 1400-1455 - 250Mbps ~1h
copy test NVMe to NVMe 1456- 4-8 min 3.4-1.4 GB/s (thermal throttling) (990 pro 50% of max 8GB/s)
P.256 of Generative Deep Learning 2nd Edition - David Foster https://towardsdatascience.com/how-to-build-an-llm-from-scratch-8c477768f1f9 https://github.com/allenai/allennlp/discussions/5056 https://support.terra.bio/hc/en-us/community/posts/4787320149915-Requester-Pays-Google-buckets-not-asking-for-project-to-bill
C4 = Colossal Clean Crawled Corpus start 20231203:0021 - estimate $100 US for gcs egress An average of 300mbps with peaks of 900mbps from the GCP bucket means 800GB x 8 bits = 6400Gbits at .3Gbps = 6hours ~ ETA 36GB in 26 min = 25MB/sec = 200mbps = 11h (possibly limited by the hdd - go directly to NVMe next time
$93 US for GCS egress