Open obriensystems opened 9 months ago
For z790H asus board and 14900K - keep undervolt protection off (use AI overclocking not Intel Extreme Tuning)
Performance drops to 330ms from 280ms under x57 perf
i9-13900KS at 6.2 GHZ single RTX-A4500 card Asus Z790 Hero with 1600watt supply, 6400 dual 32g ram on XMP I 4096 batch/25
2023-12-29 17:49:12.793423: I tensorflow/core/common_runtime/gpu/gpu_device.cc:2022] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-12-29 17:49:12.793436: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:887] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-12-29 17:49:12.793447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 17782 MB memory: -> device: 0, name: NVIDIA RTX A4500, pci bus id: 0000:01:00.0, compute capability: 8.6
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz
169001437/169001437 [==============================] - 35s 0us/step
2023-12-29 17:49:51.160147: W tensorflow/core/framework/dataset.cc:959] Input of GeneratorDatasetOp::Dataset will not be optimized because the dataset does not implement the AsGraphDefInternal() method needed to apply optimizations.
Epoch 1/25
2023-12-29 17:49:56.694969: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8906
2023-12-29 17:50:00.947534: I external/local_xla/xla/service/service.cc:168] XLA service 0x7fb97425ce20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-12-29 17:50:00.947561: I external/local_xla/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA RTX A4500, Compute Capability 8.6
2023-12-29 17:50:00.950948: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1703872200.993064 103 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
13/13 [==============================] - 41s 808ms/step - loss: 5.4299 - accuracy: 0.0278
Epoch 2/25
13/13 [==============================] - 4s 285ms/step - loss: 4.2069 - accuracy: 0.0712
Epoch 3/25
13/13 [==============================] - 4s 285ms/step - loss: 3.8504 - accuracy: 0.1172
Epoch 4/25
13/13 [==============================] - 4s 285ms/step - loss: 3.4829 - accuracy: 0.1750
Epoch 5/25
13/13 [==============================] - 4s 286ms/step - loss: 3.1631 - accuracy: 0.2380
Epoch 6/25
13/13 [==============================] - 4s 286ms/step - loss: 2.7725 - accuracy: 0.3111
Epoch 7/25
13/13 [==============================] - 4s 286ms/step - loss: 2.3888 - accuracy: 0.3901
Epoch 8/25
13/13 [==============================] - 4s 287ms/step - loss: 2.0793 - accuracy: 0.4557
Epoch 9/25
13/13 [==============================] - 4s 287ms/step - loss: 1.8113 - accuracy: 0.5219
Epoch 10/25
13/13 [==============================] - 4s 288ms/step - loss: 1.5876 - accuracy: 0.5753
Epoch 11/25
13/13 [==============================] - 4s 288ms/step - loss: 1.3336 - accuracy: 0.6312
Epoch 12/25
13/13 [==============================] - 4s 288ms/step - loss: 1.0699 - accuracy: 0.6984
Epoch 13/25
13/13 [==============================] - 4s 289ms/step - loss: 0.9236 - accuracy: 0.7364
Epoch 14/25
13/13 [==============================] - 4s 289ms/step - loss: 0.7571 - accuracy: 0.7804
Epoch 15/25
13/13 [==============================] - 4s 290ms/step - loss: 0.6041 - accuracy: 0.8242
Epoch 16/25
13/13 [==============================] - 4s 290ms/step - loss: 0.6497 - accuracy: 0.8138
Epoch 17/25
13/13 [==============================] - 4s 290ms/step - loss: 0.5552 - accuracy: 0.8316
Epoch 18/25
13/13 [==============================] - 4s 290ms/step - loss: 0.4580 - accuracy: 0.8647
Epoch 19/25
13/13 [==============================] - 4s 290ms/step - loss: 0.3844 - accuracy: 0.8903
Epoch 20/25
13/13 [==============================] - 4s 290ms/step - loss: 0.3997 - accuracy: 0.8838
Epoch 21/25
13/13 [==============================] - 4s 290ms/step - loss: 0.3681 - accuracy: 0.8954
Epoch 22/25
13/13 [==============================] - 4s 291ms/step - loss: 0.3103 - accuracy: 0.9070
Epoch 23/25
13/13 [==============================] - 4s 290ms/step - loss: 0.2674 - accuracy: 0.9209
Epoch 24/25
13/13 [==============================] - 4s 291ms/step - loss: 0.3407 - accuracy: 0.9027
Epoch 25/25
13/13 [==============================] - 4s 291ms/step - loss: 0.3117 - accuracy: 0.9118
NVidia cuda image
Dockerfile
The key to GPU passthrough to docker is the --gpus variable - if you don't set it you will get the following
Ada RTX-3500 on P1 Gen 6 - 202311 - AD104 5120 cores
CPU i7 13900H laptop
CPU i9 13000K laptop