I've been heads down in gcs file copying/persistent disk land.
I did a bunch of in-cluster testing on Thursday and Friday and discovered several things:
gsutil rsync isn't smart enough to know that a family tables with the same name/size in one run_dir on gcs is the same as another. This means rsyncing to disk when a new run goes live is still really slow. There's a "checksum" flag to rsync which does work but appeared to be even slower than just copying all the files from scratch.
This indicates the strategy hana and I discussed Thursday won't be great... we can get the helm sync to persistent disk to be much faster than it currently is on average, but it'll still have a ~30min worst case.
Copying from a slower persistant disk to a fast one is much faster than from gcs to fast disk, but still too slow for a pod startup. It'd be ~5-10min.
I timed some hail tables filterings across standard, balanced, premium, the ephemeral-local ssds we're using now, and the in-memory filesystem we're using fo the annotations table and wasn't seeing a meaningful difference.. suggesting we should just use standard disks? I also timed against the cloud storage bucket and did see more latency there (50%-ish). (I'm not sure whatever I was doing here was correct though...)
It was much easier than I expected to attach a disk to the airflow k8s cluster and access the auto-scaling to get a well-provisioned rsync to run in a kubernetes pod. We can basically ask for resources from the cluster and get them for free!!
It's harder than I expected to produce a disk from airflow that the seqr-chart can attach... we need to create both a snapshot of the airflow disk and then a "disk" from the snapshot. We'd need to manage a bunch of potentially expensive gcs resources... which is kinda gross.
I have one more idea I'm gonna try... a persistent disk in airflow maintained with gsutil rsync, publishing a tarball archive to gcs then downloading and unarchiving to a persistent disk in the seqr cluster.
I've been heads down in gcs file copying/persistent disk land.