Open obriensystems opened 12 months ago
Add TPUv5 capability
P.256 of Generative Deep Learning 2nd Edition - David Foster https://towardsdatascience.com/how-to-build-an-llm-from-scratch-8c477768f1f9 https://github.com/allenai/allennlp/discussions/5056 https://support.terra.bio/hc/en-us/community/posts/4787320149915-Requester-Pays-Google-buckets-not-asking-for-project-to-bill
C4 = Colossal Clean Crawled Corpus start 20231203:0021 - estimate $100 US for gcs egress An average of 300mbps with peaks of 900mbps from the GCP bucket means 800GB x 8 bits = 6400Gbits at .3Gbps = 6hours ~ ETA 36GB in 26 min = 25MB/sec = 200mbps = 11h (possibly limited by the hdd - go directly to NVMe next time
$93 US for GCS egress
Bootstrap TPU project
modify for TF - TPUStrategy
https://www.tensorflow.org/api_docs/python/tf/distribute/MirroredStrategy#used-in-the-notebooks
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])
cifar = tf.keras.datasets.cifar100 (x_train, y_train), (x_test, y_test) = cifar.load_data()
with strategy.scope():
https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/ResNet50
https://keras.io/api/models/model/
parallel_model = tf.keras.applications.ResNet50( include_top=True, weights=None, input_shape=(32, 32, 3), classes=100,)
https://saturncloud.io/blog/how-to-do-multigpu-training-with-keras/
parallel_model = multi_gpu_model(model, gpus=2)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
https://keras.io/api/models/model_training_apis/
parallel_model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"]) parallel_model.fit(x_train, y_train, epochs=10, batch_size=256)#5120)#7168)#7168)