Attempting to run a standalone checkpointing workload using an 1T model (9.96TiB checkpoint size) on 512 n2-standard-32 nodes and observing that the memory usage slowly increases over time and eventually reported OOM after ~ 60 writes. Here's the memory consumption chart:
Attempting to run a standalone checkpointing workload using an 1T model (9.96TiB checkpoint size) on 512 n2-standard-32 nodes and observing that the memory usage slowly increases over time and eventually reported OOM after ~ 60 writes. Here's the memory consumption chart: